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Telling stories 


The UK Research Excellence Framework’s focus on impact is a useful reminder of all the ways that 
science can help society — both economically and by other means. 


want to improve the world, others to understand how it works. 
But how many foresee that their work will help to resurrect a 
sixteenth-century English warship? 

In the language of twenty-first-century science, such research 
has a new label: impact. Hundreds of thousands of people, after all, 
have queued to see the Tudor timbers of the partially restored Mary 
Rose, salvaged from the sea floor, and now on display ina museum in 
Portsmouth, UK. 

They do so thanks to the efforts of physicists, who tested radar 
imaging on the wreck site; marine biologists, who spotted borer worms 
still living in the timber; and chemists, who created nanoparticles to 
prevent the waterlogged wood being damaged by bacterial action. 
Once artefacts had been brought up from the wreck, materials scien- 
tists examined the corrosion on Tudor cannon balls; biomechanics 
experts analysed the arm bones of Tudor archers; and archaeologists 
inspected skulls to reconstruct the faces of the Mary Rose’s crew. And 
all of this work was paid for — at least partially — by the British tax- 
payer, as part of UK investment in publicly funded science. 

If scientists were once coy about the good work that they do, they 
cannot now afford to be. In fact, the British system now demands that 
they boast of the impact their research has on society. For the first time, 
the mammoth multi-year assessment of UK university research, used 
to help rank institutions and allocate grants, included judgements of 
such impact. This is a good thing. 

The case studies and reports from this Research Excellence Frame- 
work assessment have now been published, providing a compendium of 
some 7,000 stories of good done, lives saved and ancient warships fixed 
up. As we discuss on page 150, scholars of research impact are rubbing 
their hands together at the thought of analysing the stories. Preliminary 
text-mining suggests that across many disciplines, studies strewn with 
words justifying the significance or reach of the work — suchas ‘million; 
‘major and ‘global’ — tended to score more highly than narratives that 
over-used words such as ‘research, ‘university’ and ‘impact’ 

Conventional measurements of research impact beyond academia 
seek hard data, not stories. They typically revolve around econometric 
models that try to capture the financial return of investing in science, 
or count small slices of quantifiable business activity, such as patents or 
spin-out companies. To be sure, there are plenty of those examples in 
the case studies. But taken as a whole, the narratives remind us of the 
many broader ways in which taxpayer-funded research ‘pays back’ on its 
investment — and that hard metrics are not the only way to capture this. 

Indeed, one benefit of the focus on broad impact is that individuals 
and institutions that do good work that makes a positive difference to 
people's lives, society and the economy earn recognition — and moti- 
vation — even if they are not producing profound scientific insights. 

There are some practical difficulties of running such an assessment, 
especially for the first time. Some researchers say that although they 


Pe embark on a career in science for many reasons. Some 


are pleased to see the results, it was not worth the burden on academics’ 
time and university budgets involved in collecting the case studies. 
And it is true that, although a large set of good-news stories makes 
a valuable collection to dip into for advocacy purposes, the narratives 
from this particular exercise do not give a comprehensive view. Uni- 
versities had to submit only a few of their best examples (and according 
to the data, many may have minimized the number of staff members 
whose work was submitted, so as to cut down 


“If scientists on the number of case studies that they had to 
were once coy provide). Another problematic area concerns 
about the good the difficulty of grading case studies when 
work they do, many different universities might each claim 
they cannot now an influence ona final product (for example, 


afford to be.” a drug brought from bench to bedside). 


These are teething troubles. The decision 
of the UK funders to grade the case studies, and to use the scores to 
help them to decide the destination of £2 billion (US$3 billion) in 
performance-linked annual funding, meant that universities across 
the country have taken the exercise seriously. The result is a reminder 
of the many ways in which publicly funded research benefits society 
in the United Kingdom and beyond. 

It demonstrates one other important point. Although the ‘impact 
agenda may focus minds and give universities and funders another 
way to make science tangible and measurable, the UK exercise shows 
that academics had been committing to impact long before it became 
a buzzword. The impact claimed is recent, within the past 5 years or so, 
but the research on which that impact is based is often up to 20 years old. 

The focus on impact is a new thing, in other words — but the 
creation of impact is not. The more visible those impacts become, the 
better for all concerned. = 


Spot the difference 


The US measles outbreak highlights why most 
states should reconsider their vaccination rules. 


chosen not to vaccinate their children against diseases such 

as whooping cough, mumps and measles. The consequence 

has been a periodic return of these historical scourges, in localized 

outbreaks of a few dozen to a few hundred people. These episodes 

often appear in local news reports, some of which warn that lower 
vaccination rates could result in a nationwide outbreak. 

Reading the US news media over the past two weeks, you might 


() ver the past decade, increasing numbers of US parents have 
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conclude that that day has come. The current US measles outbreak, 
which began in December and was first reported in late January, has 
blown up into a national debate over the rights of parents to decide 
whether their children should be vaccinated. But by global standards, 
it is a tempest in a teapot: as of 6 February, measles had struck 121 peo- 
ple in 17 states and the District of Columbia. 

Those numbers are unremarkable. Since October, a measles outbreak 
has affected more than 370 people in Germany; it saw almost 1,800 cases 
in 2013 and more than 1,600 in 2011. The Philippines had more than 
50,000 cases in 2014. The United Kingdom had only 137 cases last year, 
but in both 2012 and 2013 it had close to 2,000 (see page 148). 

In fact, even by US standards, the current outbreak is not unprec- 
edented. Last year, a much larger outbreak was sparked by Amish 
missionaries returning from the Philippines to Ohio, where low 
vaccination rates among the community caused 383 cases. 

Perhaps that incident stayed out of the national spotlight because it 
was an unusual set of circumstances that occurred in an isolated rural 
community. But the current outbreak centres on ‘the happiest place on 
Earth’ — Disneyland in southern California. At least 42 people seem 
to have been exposed to measles at the theme park, which receives an 
estimated 16 million visits a year. 

Fortunately for the public’s health, attention around the outbreak 
has come down in favour of vaccination and against the myths about 
its dangers. Public opinion has turned against parents and physicians 
who are suspicious of vaccines. Two potential Republican presidential 
candidates, Governor Chris Christie of New Jersey and Senator Rand 
Paul of Kentucky, at first declared that parents should have the right to 
decide whether their children are vaccinated, and then had to clarify 
their positions in the face of harsh criticism. 

Whether or not the theme park’s involvement in the episode contrib- 
uted to the media coverage, Disneyland’s cherished place in US culture 
makes it ideal for an infectious-disease outbreak. It is popular with inter- 
national tourists eager for a quintessential American experience, who as 


a group are less likely than US residents to be vaccinated. The parkalso 
hosts large numbers of infants less than one year old — younger than the 
age at which the first measles shot is generally given in the United States. 
And Disneyland is at the epicentre of the US anti-vaccine move- 
ment. Although 94.7% of US children entering school at around age 5 
are vaccinated against measles, in hundreds of California schools 
the percentage of vaccinated children falls well short of the 92% 
considered necessary to produce the ‘herd 


“Disneyland is immunity’ that prevents transmission of the 
at the epicentre disease. The state’s public-health department 
of the US reports that 2.54% of children entered school 


anti-vaccine 
movement.” 


in 2014 with an exemption from vaccination 
based on personal belief. 

The federal government has little say in who 
gets a measles shot — those rules are written by individual states. Most, 
like California, allow parents to send their children to school unvacci- 
nated by claiming a religious or philosophical objection to the practice. 
But two — Mississippi and West Virginia — allow only medical excep- 
tions. And that, many observers have argued, is why Mississippi, one of 
the poorest states in the union, has the highest percentage of 5-year-old 
children who have received vaccination for measles, mumps and rubella. 

Last month, the Mississippi state legislature was considering a bill 
to allow the same types of personal-belief exemption that most other 
states allow. But on 3 February, a committee in the state's House of Rep- 
resentatives killed the proposal. On 4 February, legislators in California 
said that they would introduce a bill to adopt the same strict rules as 
Mississippi. And several other states, including Maine, Minnesota and 
Oregon, are considering measures that would require parents to consult 
with a physician about vaccines before being granted an exemption. 

That is a step in the right direction. Parents, of course, have the 
right to decide what is best for their children. But when it comes to 
vaccination, those decisions should be based on complete and accurate 
information about the risks and benefits. m 


e e 
A single light 
A year of illumination switches on witha 
Nature special issue. 


Nations General Assembly in December 2013 were resolutions 

to develop “a world against violence and violent extremism” 
and “measures to eliminate international terrorism”. Against such 
targets, the goal of UN resolution A/RES/68/221, passed in the same 
session, might seem unambitious: to recognize the importance of light 
in the lives of the citizens of the world. 

Some 42 days into that effort — officially called the International 
Year of Light and Light-based Technologies 2015 — Nature is doing 
its bit. In this special issue, we offer a series of articles that explore 
how researchers are pushing the properties of light to new extremes, 
and the impact that these studies are already having and could have 
in future. The print-journal package begins on page 153 and there is 
more available online at nature.com/light2015. 

Light and science have been entwined for more than a thousand 
years; light and life for much longer. This is reflected in the goals of the 
UN celebration, from discussions of solar energy and its crucial poten- 
tial in tackling energy and climate problems to the societal impact 


of artificial light : 
LIGHT 


in our cities and 
A Nature special issue 


homes, and how 
it guides develop- nature.com/light2015 


ment. According to 


A mong the measures approved at the 68th session of the United 


138 | NATURE | VOL 518 | 12 FEBRUARY 2015 
© 2015 Macmillan Pub 


the UN, “the 21st century will depend as much on photonics as the 
20th century depended on electronics”. If so, then more of the work 
that researchers are engaged in to understand and harness light — to 
make light work — will need to move out of the laboratory. 

Light inspires, too. The organizers of the UN year of light are 
seeking people to follow in the (chunky) footsteps of writers such as 
Fyodor Dostoyevsky and Johann Wolfgang von Goethe, who wrote 
about optics. They invite those who feel that they have something 
to say about light and any phenomena or feeling connected to it to 
enter a literary competition. Poems, short stories, essays and plays 
are welcome, but must be submitted by the end of next month (see 
go.nature.com/5pqdnt for details). Winning entries will appear in a 
special anthology — published a thousand years after Ibn al-Haytham’s 
classic treatise, Book of Optics (see page 164). 

Light in 2015 may be all about applications and technology, but it 
retains a powerful theoretical pull on the scientific mind. Countless 
children in night-time gardens have been astonished and intrigued by 
the news that the light arriving from distant stars is a historical record 
— the stars themselves could be long gone even as the light carries 
their image on its journey. Generations of students have tried to deci- 
pher whether light is a wave or a particle, and in doing so have come 
to accept that scientific reality demands a greater tolerance of uncer- 
tainty than the textbooks suggest. Albert Einstein’s general theory of 
relativity — the centenary of which is recognized as part of the UN’s 
celebration — has come to represent an intuition warped just as much 
as the light in the gravitational field that it describes. 

Light has outgrown its metaphorical role as an answer to questions; 
light itself remains a puzzle. To solve that puzzle is an ambition that 
deserves the recognition that the coming months will shine on it. As 
the biochemist and author Isaac Asimov put it: “There is a single light 
of science, and to brighten it anywhere is to brighten it everywhere.” m 
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this week, in the build-up to next weekend's Oscars ceremony in 
Los Angeles. Uniquely in the history of the silver screen, two of 
the leading contenders for Best Picture concern the lives of two great 
scientists, mathematician Alan Turing and physicist Stephen Hawking. 

Both films have been widely praised. But unfortunately, in terms of 
shedding light on what made these scientists tick — or furthering the 
art of film-making — The Imitation Game and The Theory of Every- 
thing each leave a great deal to be desired. 

The two films were produced in the United Kingdom, not in Holly- 
wood. But they each feature a catalogue of clichés, of eccentric scien- 
tists and true love, well worthy of Hollywood in its gory heyday. 

You may say that it is too much to ask — 
but I think scientists deserve to see major, 
fact-based feature films about science pre- 
sent their lives in ways that resonate, at least 
to some extent, with the world of science as it 
really is. Most of us can recognize the authen- 
tic when we see it; in the case of these two 
films, we don't. 

It is ironic that although Hollywood has 
shown itself capable of producing, on occa- 
sion, complex, postmodern masterpieces 
such as 2004’s Crash, film-makers here in 
the United Kingdom still churn out the sort 
of sentimental slop that British satirists used 
to make a living by sending up, a quarter of 
acentury ago. (I refer younger and non-UK 
readers to the genius of the Comic Strip series.) 

The Imitation Game, Morten Tyldum’s portrayal of Alan Turing, is 
the greater disappointment of the two. Benedict Cumberbatch’s per- 
formance as Turing has been widely — and justifiably — lauded. But 
the script, unfortunately, portrays Turing as a dysfunctional, almost 
autistic, individual and trots through clichés of how a ‘genius’ treats 
his peers with all the finesse of a children’s fable. 

All we learn about the project to bust the German Enigma cipher 
in the Second World War is that everyone was doing it all wrong until 
our erstwhile, eccentric hero turns up, argues with everyone in sight 
and relentlessly ploughs his own furrow, whatever that may be (we are 
never told). The film is significantly weaker for saying almost nothing 
about the nature of the problem, or about Turing’s role — relative to 
others, inside and outside the project’s base at Bletchley Park — in the 
conception and implementation of what we now call the computer. 

It also groundlessly alleges that Turing’s homosexuality made him 
turn a blind eye to a likely spy at Bletchley Park — 


S ome Hollywood stardust will sprinkle across the world of science 


a piece of worthless and defamatory melodrama DNATURE.COM 
that seems gratuitous, given the ample material _ Discuss this article 
provided by Turing’s real life story. online at: 


Greater emotional nourishment, at least, is _go.nlature.com/Gakuhg 


MOST OF US CAN 
RECOGNIZE THE 
AUTHENTIC 


WHEN WE SEE IT; 


IN THE CASE OF THESE 
TWO FILMS, 


WE DON'T. 


> w And the winner ts: 
= not science 


Portrayals of science in the cinema are growing in sophistication — but not 
exactly at the speed of light, says Colin Macilwain. 


forthcoming from The Theory of Everything, in which Eddie Redmayne 
skilfully carries the viewer into the world of Stephen Hawking, as his 
body is progressively ravaged by motor neuron disease. 

Hawking is portrayed sympathetically but convincingly, and the film 
addresses the great issues of his life outside science — the impossible 
demands placed on his first wife, Jane, on whose memoir, Travelling to 
Infinity: My Life with Stephen (Alma, 2008), the film is largely based, 
and the lack of support offered to the couple from the outside world. 

Some critics have said that the film ought to have been even harsher. 
The book on which it is based is a softer version of Jane Hawking’s 
earlier memoir, Music to Move the Stars (Macmillan, 1999), now out 
of print. (Intriguingly, second-hand copies are trading on Amazon for 
several hundred pounds.) 

I enjoyed and believed this film — but it 
makes only a cursory effort to describe or 
address Hawking’s scientific trajectory. Given 
his status as perhaps the world’s best-known 
living scientist, there is something unsettling 
about that. 

Both films present a bombastic, simplistic 
and ‘hero-takes-all’ picture of science — a pic- 
ture that is still promoted heartily through the 
Nobel prizes, and by much science writing. 

I prefer the more jaundiced view taken by 
Paul King’s family film Paddington, in which 
geographer Montgomery Clyde is expelled from 
his learned society for failing to kill and bring 
back bears that he has found in Peru. 

As has been widely noted, both audiences 
and critical attention have been shifting from cinema to the smaller 
screen, as television writers adapt to a twenty-first century in which 
people are growing wise to the clichés foisted on them in the past. 

A more-nuanced approach to storytelling has emerged in count- 
less television series, from Breaking Bad to House of Cards. None of 
these, so far, is built around the world of science, but a similar intel- 
ligence shines through the world-beating science-based sitcom, The 
Big Bang Theory. Trite as some of its scripts may be, Big Bang has a 
stronger grasp than either of these movies of how science really works, 
bouncing along on a melee of inspiration, treachery, serendipity and 
teamwork. 

Big Bang’s barrage of cameos, from the likes of physicist Brian Greene 
and even Hawking himself, speaks to its credibility and fan base inside 
the scientific community. Its appeal carries an important message, too: 
scientists are not circus freaks; they are just people, whose work lets 
them express their inner nerd. It would be nice to see something about 
science on the big screen that carried half as much conviction. = 


Colin Macilwain writes about science policy from Edinburgh, UK. 
e-mail: cfmworldview@googlemail.com 
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Selections from the 
scientific literature 


RESEARCH HIGHLIGHTS 


Nanoparticles 
stuck on tape 


Researchers have found an 
easy way to deposit metal 
nanoparticles on a surface — 
using sticky tape. 

Adding nanoparticles to a 
surface can give it properties 
such as electrical conductivity. 
Bartosz Grzybowski at 
Northwestern University in 
Evanston, Illinois, and his 
colleagues showed that when 
commercial Scotch tape is 
peeled away, bonds within 
the tape polymer break and 
radicals form on its surface. 
These then react with metal 
salts to produce metal 
nanoparticles on the tape. 

When the team placed 
peeled tape into a solution 
of silver nitrate for several 
hours, the tape turned yellow- 
orange — indicating that silver 
nanoparticles had formed. 
The silver-coated tape showed 
antibacterial activity and 
remained sticky. 

J. Am. Chem. Soc. http://doi.org/ 
zzn (2015) 


STEM CELLS 


Injected cells fix 
brain injury 


Cells derived from human 
stem cells repair brain damage 
in irradiated rats, suggesting a 
possible therapy for survivors 
of brain cancer. 

Radiation treatment of brain 
cancer can impair memory, 
attention and learning. Viviane 
Tabar at the Memorial Sloan 
Kettering Cancer Center in 
New York and her colleagues 
used human embryonic stem 
cells to make progenitor cells 
that form oligodendrocytes, 
which insulate nerve fibres, 
boosting the speed of electrical 
impulses. The team injected 
these cells into the brains of 
rats that had been exposed 


Capsules collect carbon dioxide 


Microcapsules containing a liquid carbonate 

solvent could capture carbon dioxide from power 

plants more efficiently than existing methods. 
Currently, CO, is captured at power plants 

by passing the flue gas over a solution of liquid 

monoethanolamine. The liquid is corrosive, 

forms toxic by-products and must be heated 

to high temperatures to recover the CO, and 

regenerate the solvent. Jennifer Lewis and her 

colleagues at Harvard University in Cambridge, 


to radiation. The animals did 
better at learning and memory 
tasks than irradiated rats that 
had not received cells, and 
about as well as untreated rats. 
Analysis of rat brain tissue 
revealed that the transplanted 
cells re-insulated nerves in 
many parts of the brain. 
Cell Stem Cell 16, 198-210 (2015) 


ECOLOGY 


Bee behaviour sees 
colonies collapse 


Honeybee colonies could be 

collapsing because younger 

bees are flying out to forage, 

raising their risk of death. 
Many bee colonies are 

failing, probably because 

of parasites, pathogens and 


140 | NATURE | VOL 518 | 12 FEBRUARY 2015 
© 2015 Macmillan Publishers Limited. All rights reserved 


pesticides. Bees react to 
such stressors by foraging 
at a younger age, so to 
learn how this might cause 
rapid population declines, 
Andrew Barron at Macquarie 
University in Sydney and his 
colleagues radio-tagged bees 
in experimental colonies to 
monitor their flight behaviour. 
The insects that began foraging 
earlier in life completed fewer 
successful trips and had a lower 
survival rate than those that 
foraged at the normal age. 
Mathematical models 
showed that the resulting 
decrease in food for the colony 
and the increased forager 
mortality over time led to rapid 
colony collapse. The authors 
suggest that supplemental 
feeding of colonies could help 


Massachusetts, created microcapsules made 
ofa highly porous silicone skin containing a 
carbonate solvent. These solvents absorb CO, 
slowly, but encapsulation of solvent boosts the 
absorption rate tenfold (compared to pools of 
liquid carbonate) by increasing the surface area. 
The capsules (pictured) are chemically stable 
and environmentally benign, and CO, can be 
recovered by modest heating. 
Nature Commun. 6, 6124 (2015) 


to stave off bee declines. 
Proc. Natl Acad. Sci. USA 
http://dx.doi.org/10.1073/ 
pnas.1422089112 (2015) 


CLIMATE CHANGE 


Aerosols reduce 
Arctic warming 


Particles suspended in the 
atmosphere have decreased the 
amount of warming caused by 
greenhouse gases in the Arctic, 
but this could change as future 
air pollution is reduced. 
Aerosols have a cooling 
effect by reflecting sunlight 
back into space. Mohammad 
Reza Najafi at the University 
of Victoria in Canada and 
his colleagues analysed nine 
climate models running from 


JOHN VERICELLA, ROGER AINES (LLNL); JENNIFER LEWIS (HARVARD) 
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1913 to 2012, comparing 
simulations with and without 
greenhouse gases, aerosols and 
other climate drivers. Their 
results show that aerosols have 
offset 1.3-2.2°C of Arctic 
warming from greenhouse 
gases, limiting the observed 
warming to 1.2°C. With 
aerosol emissions projected to 
drop in the coming decades, 
the rate of the warming is likely 
to increase. 

The team says that its results 
underscore the reliability of the 
climate models, which simulate 
8.3°C of warming in the Arctic 
in ahigh-emissions scenario by 
the end of the century. 

Nature Clim. Change http:// 
dx.doi.org/10.1038/nclimate2524 
(2015) 


Chimps learn new 
calls for food 


Captive chimpanzees learn new 
grunts from neighbours to refer 
to foods — the first evidence 
of such behaviour in non- 
humans, according to a study. 
To see whether chimps 
(pictured) show flexibility 
in the calls they use to refer 
to everyday objects, Simon 
Townsend at the University 
of Zurich, Switzerland, and 
his team compared the grunts 
of seven chimps that were 
moved froma safari park in 
the Netherlands to join six 
chimps ina UK zoo. A year 
after the move, the Dutch 
chimps referred to apples with 
a high-pitched call, in contrast 
to the deep-timbred grunts of 
the UK chimps. But after three 
years, the Dutch chimps had 
adopted their neighbours’ calls. 
The findings suggest that 
social learning of referential 
words in humans could have 


a longer evolutionary history 
than was thought. 
Curr. Biol. http://doi.org/zzd (2015) 


Deep-brain zap 
for addiction 


An electric current sent deep 
into the brain, together with a 
therapeutic drug, can reverse 
the symptoms of cocaine 
addiction in mice. 

Christian Liischer at the 
University of Geneva in 
Switzerland and his colleagues 
implanted an electrode into 
the brains of cocaine-addicted 
mice. Stimulating the animals’ 
neurons at a low frequency only 
temporarily relieved symptoms 
of addiction after the mice 
were injected with cocaine. 

But when the researchers 

also gave the animals a drug 
that blocks receptors for the 
neurotransmitter dopamine 

— involved in addiction and 
reward — the symptoms abated 
for longer. Neural connections 
that were overactive because 

of cocaine exposure also 
functioned normally again. 

The researchers say that this 
approach could be a potential 
therapy for humans with 
addiction and other neural 
disorders. 

Science 347, 659-664 (2015) 


Soggy climates 
affect language 


In warm, moist climates, 
human languages developed 
with more complex linguistic 
tones than did those in colder, 
drier regions. 
It is thought that language 
is not influenced by ecological 
factors. However, Caleb Everett 
at the University of Miami in 
Coral Gables, Florida, and 
his colleagues concluded 
the opposite after looking at 
studies of vocal-cord biology 
and comparing the geographic 
origins of more than 
3,700 languages with 
the humidity and annual 
average temperatures of 
those regions. The 
vocal-cord data showed. 
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SOCIAL SELECTION 


Popular articles 
on social media 


Lab size matters for productivity 


To publish the most papers, labs should ideally have 10 to 

15 members, according to a much-discussed study in Peer] 
PrePrints. Adding more graduate students and postdocs 
beyond that number does not guarantee a continued rise in 
high-impact papers, the study found, partly because the extra 
workers tend to be much less productive than the principal 
investigator (PI). Mark Pallen, who heads a microbiology 

lab at the University of Warwick, UK, tweeted “Nice that PIs 
matter!” But Jessica Chong, a geneticist and postdoc at the 
University of Washington in Seattle, called it an “odd analysis” 
on Twitter, adding, “we expect PIs to ‘produce’ more papers 
than any other lab member. They're authors on all papers!” 


PeerJ Prepr. 3,e812v1 (2015) 


Based on data from altmetric.com. 
Altmetric is supported by Macmillan 
Science and Education, which owns 
Nature Publishing Group. 


that cords produce sounds of 
varying pitch more accurately 
in moist air than in dry air. 
Languages with complex 
tone, such as Mandarin 
Chinese, originated mainly 
in warm, moist climates, 
whereas languages such as 
English, which have little or 
no tone, came from arid or 
cold regions. 

Proc. Natl Acad. Sci. USA 112, 
1322-1327 (2015) 


EVOLUTION 


A hint at how 
hearing evolved 


Early four-legged vertebrates 
may have been able to hear 
sounds on land, even though 
they lacked key ear structures. 
Christian Christensen at 
Aarhus University in Denmark 
and his colleagues studied the 
hearing of the African lungfish 
(Protopterus annectens; 
pictured), the closest living 
relative of early tetrapods 
that began moving onto land 
around 350 million years ago. 
The middle ear, which 
senses changes in air pressure 
caused by sound, is missing in 
lungfish. The researchers found 
that low-frequency sounds in 
air caused the lungfish’s head to 


> NATURE.COM 
For more on 
popular papers: 
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vibrate. Its brain responded to 
these frequencies, suggesting 
that the animal detects airborne 
sounds using the vibrations. 
Another study by some of 
the same authors looked at 
salamanders, which also lack 
middle ear structures and live 
in water and on land. The team 
showed that even juvenile 
salamanders, which are fully 
aquatic, can detect sound in air. 
The findings suggest that 
early tetrapods were pre- 
equipped to hear sounds in air, 
which probably helped them 
to adapt to life on land and 
eventually led to more-modern 
middle ear structures. 
J. Exp. Biol. 218, 381-387 (2015); 
Proc R. Soc. B 282, 20141943 
(2015) 


> NATURE.COM 

For the latest research published by 
Nature visit: 
www.nature.com/latestresearch 
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SEVEN DAYS sescnsn 


POLICY 


US security 

US President Barack Obama 
has incorporated the dangers 
posed by global warming and 
poverty into a new national- 
security strategy. Released 

on 6 February, the document 
says that both economic 

and environmental threats 
can hamper growth, foster 
extremism and ultimately 
lead to military interventions. 
The document also highlights 
the role of scientific and 
technological innovation in 
promoting cleaner domestic 
energy, stating that the United 
States “can and will lead 

the global economy while 
reducing our emissions”. 

The White House’s previous 
national-security strategy was 
outlined in 2010. 


Three-person IVF 
The UK House of Commons 
voted on 3 February to legalize 
a gene-therapy technique 
that could help women 

to avoid passing genetic 
defects on to their children 
through mutations in their 
mitochondria — the cell’s 
energy-producing structures. 
The technique, known as 
mitochondrial replacement, 
uses healthy donated eggs 
instead of the mother’s 
diseased eggs to create ‘three- 
person embryos for in vitro 
fertilization (IVF). The UK 
House of Lords must also 
approve the measure, which 
would authorize the country’s 
fertility regulator to allow 
mitochondrial replacement in 
the future. See page 145 

for more. 


Bee petition 

A coalition of 11 US 
environmental groups has 
urged President Barack Obama 
to significantly toughen 

rules governing the use of 
neonicotinoid pesticides, 
which have been linked to 


Neil Armstrong's Moon bag revealed 


A white bag used by Neil Armstrong during his 
1969 Moon landing was made public last week. 
It was found in a cupboard by Armstrong's 
widow, Carol, after his death in 2012. The 
contents (pictured), including a camera and 
waist tether, were analysed by curators at the 
Smithsonian National Air and Space Museum 


declines in bee numbers. In 
aletter to the White House, 
groups including the National 
Audubon Society, the Sierra 
Club and Friends of the Earth 
called for more research on the 
impact of these chemicals on 
pollinators. They also propose 
aban on treating seeds with 
neonicotinoids because this 
affects the entire plant. A 
federal strategy for dealing 
with bee health is expected in 
the coming months. 


Clean-coal cut 

The US Department of Energy 
has pulled out of its flagship 
clean-coal project, dubbed 
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FutureGen, more than a 
decade after it was proposed. 
The US$1.7-billion 
demonstration project, 
modified in 2010, was 
intended to provide 
climate-friendly power 

by retrofitting a coal-fired 
power plant in Illinois to 
capture carbon dioxide and 
pipe it into a saline aquifer 
some 1,200 metres below 
ground. The department had 
committed $1 billion to the 
project, but pulled out owing 
to ongoing questions about 
private investments. By law, 
the federal money had to be 
spent by September this year. 
See go.nature.com/2pymmo 
for more. 


in Washington DC, who concluded that the 
items were from the Eagle lunar module (LM). 
Transcripts of Armstrong speaking to astronaut 
Michael Collins helped to confirm the bag’s 
authenticity: “That one’s just a bunch of trash 
that we want to take back — LM parts, odds and 
ends, and it wont stay closed by itself” 


Japan Venus probe 
Japan's errant Akatsuki probe 
is to get another chance at 
studying the meteorology of 
Venus after a failed attempt 

in 2010. The Japan Aerospace 
Exploration Agency 
announced on 6 February 
that it will try to insert the 
¥25-billion (US$211-million) 
craft into Venus’s orbit this 
December. If this is successful, 
Akatsuki will use remote 
sensing to observe the planet’s 
clouds, atmosphere, lightning 
and surface conditions, 
allowing a comparison with 
similar mechanisms on Earth. 
The probe has been orbiting 
the Sun since December 
2010, after a malfunctioning 


NATIONAL AIR AND SPACE MUSEUM 


thruster stopped it 
decelerating enough to drop 
into the planet’s orbit. 


Aquarium census 
The Shedd Aquarium in 
Chicago, Illinois, will take the 
first census of microbes that 

4 live in aquariums. As part of 
3 the Aquarium Microbiome 
Project, which launched on 

3 February, researchers from 
Shedd and partner institutions 
will analyse how the microbes 
living in the aquariums tanks 
and on resident sea animals 
differ from those in natural 
aquatic environments. The 
project will also assess the 
impact of pollutants on 
microbial ecosystems, and 
plans to release its first data 
later this year. 


} PEOPLE 
FDA chief quits 


The commissioner of 

the US Food and Drug 
Administration (FDA) 
announced her resignation on 
5 February. Margaret Hamburg 
(pictured) has led the 
regulatory agency for nearly 
six years, during which time 

it established programmes 

to speed up drug approvals, 
laid the groundwork to 
regulate electronic cigarettes, 
and proposed guidelines for 
regulating medical-diagnostic 
tests. Hamburg will continue in 
the post until the end of March. 
The FDAs chief scientist, 
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TREND WATCH 


Manuscripts posted to the 


then began to doubt and lose 


SOURCE: PAUL GINSPARG/ARXIV 


sensational announcements: 
the discovery of gravitational 
waves from the birth of the 
Universe. Soon after the 


preprint server arXiv show how 
cosmologists rapidly embraced, 


interest in, one of 2014’s most 


March announcement, papers 
questioning the result started to 
emerge. The final nail in the coffin 


came last month, when researchers 


conceded that dust in the Milky 
Way accounted for the signal seen 
by the telescope BICEP2. 


Stephen Ostroff, will serve as 
acting commissioner, but the 
agency is yet to announce a 
permanent replacement. 


Physicist dies 

Val Fitch, the physicist who, 
with James Cronin, discovered 
a fundamental asymmetry 
between matter and antimatter, 
died on 5 February, aged 91. 

In 1964, while both were at 
Princeton University in New 
Jersey, Fitch and Cronin 
showed that particles of 
antimatter do not simply 
behave as mirror-symmetry 
counterparts of matter 
particles. This violation ofa 
law known as charge-parity 
symmetry is believed to be the 
reason that the Big Bang did not 
produce a Universe that is equal 
parts matter and antimatter. 
The two physicists were 
awarded the 1980 Nobel Prize 
in Physics for their discovery. 


Climate libel 


A Canadian climate scientist 
has been awarded Can$50,000 
(US$40,200) in a libel lawsuit 


THE TRAIL OF DUST 


against the National Post 
newspaper. Andrew Weaver, 
now also a politician for his 
province, challenged four Post 
articles published in 2009-10 
in the wake of the ‘Climategate’ 
scandal, when hacked e-mails 
from UK climate scientists 
were made public. The articles 
called Weaver a “climate 
alarmist’, and he said that they 
implied he was untrustworthy 
and unscientific. The 

5 February ruling by the 
British Columbia Supreme 
Court says the Post should 
take down the articles and 
that they “adversely impact on 
Dr. Weaver's reputation and 
integrity as a scientist”. 


Disease initiative 
The Global Health Innovative 
Technology Fund in Tokyo 
launched a programme 

on 5 February to spur the 
development of drugs, 
vaccines and diagnostic 

tools for infectious diseases 
prevalent in developing 
countries. The Grand 
Challenges initiative 

will invest ¥234 million 
(US$2 million) per year in 
research into diseases such 

as malaria, tuberculosis and 
Chagas disease. The first 
grant-winners are expected 
to be announced in August. 
The fund was founded in April 
2013 with money from the Bill 
& Melinda Gates Foundation 


Interest waned in the BICEP2 experiment as it became clearer 
that dust had been mistaken for a signal of gravitational waves. 
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12-16 FEBRUARY 
The American 
Association for the 
Advancement of 
Science's annual meeting 
takes place in San Jose, 
California. It will focus 
on how information and 
imaging technologies are 
transforming science. 
go.nature.com/kprsx8 


14 FEBRUARY 

The Rosetta craft 

is due for its closest 
encounter yet with 
comet 67P/Churyumov- 
Gerasimenko. The 
probe will swoop just 

6 kilometres from the 
surface, and with the 
Sun at its back should 
get the first shadow-free 
images of the comet. But 
it will not be specifically 
searching for its lost 
partner — the lander 
Philae. 


in Seattle, Washington, the 
Japanese government and 
six Japanese pharmaceutical 
companies. 


UK science funding 


The United Kingdom's science 
academies are calling fora 
huge hike in research spending 
from whichever political party 
wins the country’s election in 
May. Representatives of the 
British Academy, the Royal 
Academy of Engineering, 

the Royal Society and the 
Academy of Medical Sciences 
said on 10 February that the 
next government should aim 
to spend 3% of gross domestic 
product (GDP) on research 
and development by 2020, 
with 1% of this coming from 
the public purse. The current 
UK research spend is 1.73% 
of GDP, of which 0.5% is from 
the public sector. Government 
science spending has declined 
in real terms since 2010. 


> NATURE.COM 
For daily news updates see: 
www.nature.com/news 
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‘Cash Inside the Words that 
for retirement’ idea race to eradicate a triumphed in UK 
draws ire p.146 killer disease p.148 research audit p.150 


Physicists find 
ways to see through 
walls p.158 
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Three-person in vitro fertilization prevents women from passing on potentially harmful mutations in mitochondrial DNA. 


REPRODUCTIVE BIOLOGY 


World hails embryo vote 


UK move to allow pioneering fertility technique could spur other countries to relax rules too. 


BY EWEN CALLAWAY 


ollowing a 3 February vote in the 
iz UK House of Commons, the world may 
once again look to Britain to lead in 
fertility treatments, 37 years after in vitro fer- 
tilization (IVF) was pioneered in the country. 
The vote lifts a ban on gene-altering ferti- 
lization techniques known as mitochondrial 
replacement, or three-person IVE, in which 
mitochondria — the cell’s energy-processing 
structures — froma donor’s egg cell contrib- 
ute to a couple’s embryo. The procedures are 
intended to prevent the transmission of dis- 
eases caused by mutations in mitochondrial 


DNA. The vote, won by 382 in favour versus 
128 against, will still need to be confirmed by 
the House of Lords, which is widely expected 
to pass the law. Once approved, the Human 
Fertilisation and Embryology Authority 
(HFEA), Britain’s fertility regulator, will be 
allowed to license clinics to carry out the pro- 
cedures from October, although it could be 
some time before the first human trials begin. 

Many reproductive biologists see this as a 
step that will affect the field ona global scale. 
“We've been hoping that the UK will take the 
lead,” says Shoukhrat Mitalipov, a stem-cell 
scientist at Oregon Health & Science Univer- 
sity in Portland. His team hopes to apply to 


the US Food and Drug Administration (FDA) 
for permission to conduct clinical trials of 
mitochondrial replacement. Although its 
regulatory debate is a bit behind the United 
Kingdom’, “the US is going down the same 
path’, Mitalipov says. 

An estimated 1 in 5,000 children are born 
with diseases caused by mitochondrial muta- 
tions, which typically affect energy-hungry 
tissues such as the brain, heart and muscles. 
All mitochondrial DNA is inherited from 
the mother, and some women carry harm- 
ful mitochondrial mutations without having 
symptoms themselves. Their children can 
experience debilitating and sometimes 
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> fatal conditions such as muscular 
dystrophy or heart disorders. 

Before giving a green light to clinics wishing 
to offer the treatment, the HFEA will probably 
want further evidence that the procedure is safe, 
and will vet applications on a case-by-case basis. 

The United Kingdom was one of the few 
countries that explicitly banned mitochon- 
drial replacement by law. In many countries, 
including China and Japan, the techniques 
are prevented by regulations that should be 
simpler to overturn, without legislative inter- 
vention, says Tetsuya Ishii, a bioethicist at 
Hokkaido University in Japan. 

The same applies to the United States. 


Since 2001, the FDA has enforced a mora- 
torium on mitochondrial replacement, 
after a New Jersey fertility clinic conducted 
a related procedure to improve conception 
rates. Mitalipov’s push to launch a clinical 
trial of mitochondrial replacement set off a 
series of scientific, ethical and policy reviews 
that are still under way. 

In February 2014, an FDA advisory panel 
held a two-day meeting to consider the science 
of mitochondrial replacement. The panel iden- 
tified areas in which it wanted to see more data, 
such as the long-term health of monkeys con- 
ceived through the procedures, before it could 
move to allow mitochondrial replacement. 


It will probably take two to five years to fill 
in these gaps, says Evan Snyder, a stem- 
cell biologist at Sanford-Burnham Medical 
Research Institute in La Jolla, California, who 
is chair of the FDA panel. 

Australia is also pondering three-person 
IVE Although the country’s lawmakers opted 
against relaxing the rules after a 2011 review 
of human-cloning legislation, the UK vote 
will “provide enormous ammunition” for 
those seeking changes, says David Thorburn, 
a geneticist at the University of Melbourne, 
Australia. Still, he adds, “my gut feeling is that 
it’s unlikely to succeed until this has been done 
in practice in the UK” = 


NIH ponders ‘emeritus grants’ 


A proposal to pay senior biomedical researchers to wind down their labs draws scepticism. 


BY BOER DENG 


United States have warned of a worrisome 

trend: as competition for grants increases, 
younger scientists are finding it harder to keep 
up. In 1980, the average age at which research- 
ers received their first major award from the 
US National Institutes of Health (NIH) was 38; 
by 2013, this had risen to more than 45 (ref. 1). 
And the overall share of grant funding won by 
scientists younger than 36 withered from 5.6% 
in 1980 to just 1.2% in 2012. 

Like ageing Crown princes, junior biomedi- 
cal researchers in the United States face long 
years as leaders-in-waiting. Now, in a 3 Feb- 
ruary posting, the NIH has asked research- 
ers whether the agency would be wise to give 
‘emeritus grants’ to senior scientists to induce 
them to wrap up their research. The funding 
would “help to ensure the orderly transition of 
an experienced researcher's work when they 
wish to go on to something else, and also to 
recognize their legacy”, says Sally Rockey, the 
NIH’s deputy director for extramural research. 
If entrenched grant recipients leave the lab, the 
NIH hopes, more money will be available for 
early-career scientists. 

Those who support the idea say that it could 
ease the pressure on senior researchers to con- 
tinue working in order to bolster their retire- 
ment accounts, which in the United States 
largely depend on employee contributions. 
The evidence for this is anecdotal, however, 
and proponents of emeritus grants admit that 
few senior researchers complain that they lack 
money to close their labs. 

But judging from more than 100 comments 
left on Rockey’s widely read blog, many 


| a years, biomedical researchers in the 


researchers are highly sceptical of the plan, and 
are incensed by what they perceive as a retire- 
ment bonus for the already better-resourced. 
“The idea of allocating precious limited federal 
research dollars to a special ‘emeritus’ award 
appears, at best, tone deaf, and at worst, sug- 
gests underlying biases within the NIH that 
favour established researchers,” says neuro- 
scientist Benjamin Saunders, a postdoctoral 
researcher at Johns Hopkins University in Balti- 
more, Maryland. 

Economic research suggests that paying 
older scientists to abandon their labs is unlikely 
to be the most effective way for the NIH to 
achieve its ultimate goal. Policies for adjust- 
ing markets work better when they are direct, 
says labour economist Richard Freeman of 
the National Bureau of Economic Research in 
Cambridge, Massachusetts. “If your goal is to 
have more young researchers have independ- 
ent awards and positions, it would be more 


AGE GAP 


The US National Institutes of Health has sought 
to increase funding for new investigators, with 
mixed results. 
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efficient just to give them that,” he says. “Any 
time you try indirect methods, there is much 
more uncertainty as to what will happen.” 

But the NIH has had mixed success with 
policies designed to give more money to new 
investigators. Since 2007, the percentage of 
grants won by new applicants has approached 
the share reaped by experienced scientists (see 
‘Age gap), but critics say that funded proposals 
from younger researchers are of lower quality 
than those from older scientists. And despite 
the NIH’s efforts, the average age at which a 
researcher wins his or her first award has not 
declined. 

This may be partly because of broader demo- 
graphic changes in the biomedical workforce. 
About 1 in 3 working scientists was over 50 
in 2010, compared with 1 in 5 in 1993. This 
helps to explain why the average age of NIH 
principal investigators has risen. The age of 
first innovation itself might be increasing, too, 
according to analyses of patent filings and the 
age at which Nobel laureates win their prizes. 
Benjamin Jones, an economist at Northwest- 
ern University in Evanston, Illinois, has found 
that over the past century there has been a shift 
towards productive science at older ages, per- 
haps because innovation now requires more 
knowledge’. 

An even bigger challenge is an imbalance 
between the healthy supply of young 
scientists and the number of senior-level jobs, 
says Michael Teitelbaum, a demographer at 
Harvard Law School in Cambridge, Massa- 
chusetts. The problem has been exacerbated by 
erratic NIH funding. From 1998 to 2003, the 
agency’s budget doubled, to US$27.2 billion. 
Flush with grant money, academic research 
centres expanded, making jobs for biomedical- 


SOURCE: NIH 


science graduates plentiful and attracting more 
students to the field. 

Fortunes subsequently reversed. Since 2003, 
the NIH’s budget has contracted by around 
25% in real terms, increasing competition for 
dwindling grant money among the surplus of 
early-career scientists created during the boom. 


Without steady growth in the NIH budget, 
some have suggested that the solution is to 
train fewer graduates for careers in biomedical 
research. But the pipeline of new investigators 
shows no signs of drying up. In 2013, US univer- 
sities conferred 8,471 biomedical PhDs. These 
joined thousands of other researchers eligible 
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that year for the NIH’s Early Stage Investigator 
awards — 785 grants aimed at researchers who 
had graduated in the past decade. Too many 
heirs are awaiting too few crowns. = 
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(2015). 
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Ground finches (left) tend to have large beaks for cracking seeds, whereas warbler finches spear insects. 


EVOLUTIONARY BIOLOGY 


Darwin’s finches 
join genome club 


Scientists pinpoint genes behind famed beak variations. 


BY GEOFF MARSH 


esearchers have sequenced the genomes 
R of all 15 species of Darwin's finches, 

revealing a key gene responsible for the 
diversity in the birds’ beaks. The study, pub- 
lished online in Nature this week’, also redraws 
the family tree of these iconic birds, whose facial 
variations helped Charles Darwin to formulate 
his theory of natural selection. 

The finches are endemic to Ecuador's Galapa- 
gos archipelago and Costa Rica’s Cocos Island. 
Their beaks are adapted to their preferred food: 
warbler finches, for example, spear insects with 
thin, sharp beaks, whereas ground finches crack 
open seeds with strong, blunter beaks. The birds 
area textbook example of adaptive radiation, in 
which a single ancestor responds to a selective 
pressure — in this case, food availability — by 
diversifying into several species. 

Darwin was the first to note this, during his 


groundbreaking 1831-36 voyage aboard the 
HMS Beagle. “One might really fancy,’ he wrote 
in his diary, “that from an original paucity of 
birds in this archipelago, one species had been 
taken and modified for different ends” Almost 
two centuries later, his early suspicions have 
been widely confirmed. 

Initially, the finches were classified on the 
basis of their physical characteristics. More 
recently, it has incorporated variations in key 
DNA sequences. But nobody had compared 
whole-genome data from all 15 species until 
a team led by Leif Andersson, a geneticist at 
Uppsala University in Sweden, analysed sam- 
ples from 120 individual birds. “When we did 
the whole DNA sequence of all the species, we 
could redraw that tree,” he says. 

Overall, the researchers found good agree- 
ment with current taxonomy, but also some 
interesting deviations. For example, they con- 
clude that the ground finch Geospiza difficilis, 


which is spread across six islands, actually 
comprises three species. 

Andersson's team also discovered extensive 
mixing of genes between species. This is in line 
with field observations of hybrid birds made by 
study co-authors Peter and Rosemary Grant, 
evolutionary biologists at Princeton University 
in New Jersey who have worked in the Galapa- 
gos for decades. The genomic data reveal that 
the birds have been crossbreeding throughout 
their evolutionary history. 

Darwin famously sketched his initial idea of 
phylogeny as a branching tree, above which he 
wrote “I think” Now, says Peter Grant, “he might 
wish to redraw that tree by making connections 
between some of the branches, representing the 
hybridization and gene exchange’. 

By looking at closely related finches that have 
different beak shapes, the researchers were able 
to pinpoint the genes responsible for beak mor- 
phology. One of those genes, ALX1, is involved 
in the facial development of vertebrates, includ- 
ing fish and mammals. In humans, for example, 
loss of ALX1 leads to severe facial deformities”. 

In the finches, the gene displayed two distinct 
variants that matched up neatly with beak shape. 
Individuals from a species with a highly vari- 
able beak shape — the medium ground finch 
(Geospiza fortis) — had a mixture of the blunt 
and pointed gene variants. The finding dove- 
tails nicely with work by the Grants that docu- 
ments the species’ rapid evolution as recently 
as the 1980s, when a drought affected the bird’s 
food supply and its beak started to become more 
pointed to accommodate a new diet’. 

Andersson suspects that ALX1 drove that 
adaptation, but others say the picture is more 
complicated. Beaks “differ in many parameters, 
not just being blunt or pointed’, says Ricardo 
Mallarino, an evolutionary biologist at Harvard 
University in Cambridge, Massachusetts. Func- 
tional studies of ALX1 should help to reveal 
exactly what the gene controls, he says. His col- 
league, evolutionary biologist Arkhat Abzhanov, 
says that ALX1 may be especially important for 
finches with very specialized beaks. 

What would Darwin make of the findings? 
“We would have to give him a crash course in 
genetics,’ Grant says. “But then he would be 
delighted. The results are entirely consistent 
with his ideas.” m 
1. Lamichhaney, S. et a/. Nature http://dx.doi. 

org/10.1038/nature14181 (2015). 

2. Uz, E. etal. Am. J. Hum. Genet. 86, 789-796 (2010). 
3. Grant, P. R. & Grant, B. R. 40 Years of Evolution: 


Darwin's Finches on Daphne Major Island (Princeton 
Univ. Press, 2014). 
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Measles 
A race to 
eradication 


The US media is abuzz after an 
outbreak of measles in Disneyland, 
but the disease will keep popping 
up until it is wiped out worldwide. 


BY DECLAN BUTLER 


Measles debate has reached fever pitch in the United 
States after an outbreak that began in December at 
Disneyland in southern California. Many media outlets 
and politicians have focused on the country’s growing 
anti-vaccination movement. However, the bigger problem 
lies elsewhere. The United States was declared free of 
measles in 2000, and all outbreaks since then have been 
sparked by imported cases, which will continue to occur 
until measles is eradicated worldwide. 

The World Health Organization (WHO) has set targets 
for 2015, but progress towards them has been slow (see 
‘Targets in trouble’). After the Measles & Rubella Initiative 
was founded in 2001, the number of cases and deaths 
fell. But progress against the disease started to stall in 
2007 (see ‘A fall...then a stall’) and vaccination coverage 
plateaued in 2010, when funding plummeted during 
the global economic slowdown (see ‘Vaccination rising’). 
The WHO now concedes that few countries will attain 
anywhere near the targets. 

The United States has some grounds for concern. 

Last year saw 644 cases in 27 states, a record high 
since 2000. And by 2013, the proportion of eligible 
children who had been vaccinated had dropped by 

2% since 2004, to 91%. But the nation’s vaccination 
coverage remains high compared with other countries 
(see ‘Vaccination coverage worldwide’), and the number 
of cases is also small. The recent US outbreak infected 
121 people. But China saw 107,000 people infected last 
year, and the Democratic Republic of the Congo (DRC) 
had 89,000 cases in 2013 (see ‘Largest outbreaks’). The 
proportion of people who die varies depending on where 
you are (see ‘Different conditions, different disease’). 

The last pockets of a disease are always the hardest 
to eliminate, as shown by polio eradication, which has 
been ‘just around the corner’ for years. But if the Measles 
& Rubella Initiative can get vaccinations back on track 
worldwide, measles may yet follow smallpox, the only 
killer human disease yet to be wiped out in the wild. = 
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Vaccination rising 


Xe} 
lo} 
if 


Coverage (%) 
(or) 
ro) 
| 


pecccccceeet et Secoccccccssceseee® 
oe 


The proportion of 1-year-olds 80- Mee ee Bipsee nse one 
vaccinated against measles ez sige censeesseencies « wecauanek RPP occcnonrceec ee 
worldwide has soared since 1980, jette PFe ecw ecnneneneren eee. 70 - 
and that has cut the number of S ww Increased from 73% to 83% Remained at 83-84% 
cases. However, the global financial g 60- oe 
crisis in 2007-08, weak health-care s ” 
systems and conflict have all Bvcccce Targets in trouble 
hampered prevention efforts. geet ce a 
worn” 5 40 The World Health Organization has set targets for 2015 that cut the 
* Vaccination on ae number of cases and deaths and improve vaccination coverage from 
coverage oa & 2000 levels. But it looks unlikely that any of these will be met. 
a is) 
! s 
suameeeesee 20- 1.0- 
*, ACHIEVED SO FAR 
0.8 - * 67% fewer cases 
F o oete 75% fewer deaths 
= ge" *. 
ootte, = xe} ove oS 7° 
eeee® 6. 4 = 0.6- eee: citire 
La wn Pecccees *e 
See a es *s 
™ 8 04- *. aes “04, Rereeeses 
fee, = 3 Oo bance Cog e, wont Oy, 
90D fee... ecoescccccese oom reece, 
A fall ... then a stall Sz coos as, 
The number of cases has tumbled — ~ 0.2 — Coy, fe aes . 
from a staggering 4.1 million in 8 2%, Pocccncccerr no eeeeee® ween, 
1981 to 191,343 in 2014. But the 3 *e, gant**e, 
decline stalled in 2007 and case 2 nereeeee *e, O05 rt 
numbers have plateaued since. Lae : Expanded region : 
e Cases © Deaths ie ae ore 
1980 1982 1984 1986 1988 1990 1992 1998 2000 2002 2004 2006 2008 2010 2012 2014 
Largest 2012> 2013> 2014> 
outbreaks cee 


The US outbreak that 
sparked the recent debate 
over vaccination is tiny 
compared with ones 
elsewhere. 
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Different conditions, different disease 


The proportion of infected people who die (case fatality rate) varies depending on 
many factors, including the quality of health care, nutrition and natural immunity. 


Developed world | <0.5% 
Most developing countries | <6% 
High levels of malnutrition and 
lack of adequate health care <10% 
Emergencies or isolated areas 
with low natural immunity or lS <30% 


low vaccination coverage Case fatality rate 


SOURCE: WHO. DESIGN: JASIEK KRZYSZTOFIAK/NATURE. 
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2015 VACCINATION TARGET 
90% of children aged 1 year 
vaccinated. 


;— 2015 INCIDENCE TARGET 


Fewer than 36,500 cases 
worldwide. 


-— 2015 MORTALITY TARGET 


Fewer than 26,750 deaths. 


Deadly data 


The number of deaths worldwide 
has fallen and stalled in step 
with the number of cases. And at 
145,700, the number of deaths 
in 2013 was slightly higher than 
that in 2007. The number of 
people measles kills depends on 
many factors (see ’Different 
conditions, different disease’). 


China United 
States 
1 


Vaccination lows 


More than three-fifths of the estimated 

21.5 million children who were not vaccinated 
against measles at 9 months of age in 2013 
came from six countries. 


India 
Nigeria | 2.7 million 
Pakistan a 1.7 million 
Ethiopia J 1.1 million 
Indonesia [JJ 0.7 million 
DRC J 0.7 million 
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Vaccination coverage worldwide 


With 91% of 1-year-olds vaccinated, the United States comes in 
ahead of the WHO’s 2015 target, but behind the 80 countries that 
have already attained the WHO’s 2020 target of 95%. 


Central African 


Republic Somalia Nigeria 
) 10 20 30 40 50 60 
| 


South Sudan Vaccination coverage 


(% of 1-year olds) 


DRC 


36 countries, including 
Tanzania, Morocco, Greece, 
Cuba and South Korea, have 
attained 99% coverage. 


World 
average 


US UK 
10 


WHO 2015 WHO 2020 
target target 


Austria 
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| NEWS IN FOCUS 


Seven thousand stories 
capture impact of science 


Language analysis reflects how case studies succeeded in a unique UK research assessment. 


BY RICHARD VAN NOORDEN 


cience benefits society in 
myriad ways — but how to 
identify and encourage work 
with high impact is an obsession of 
funding agencies the world over. 
Last month, the United Kingdom 
brought new data to bear on the 
problem: almost 7,000 case stud- 
ies chronicling the economic, 
cultural and social benefits of 
the nation’s scholarship, which 
were solicited as part of a unique 
assessment exercise. As policy- 
makers pore over the documents, 
Nature has commissioned its own 
analysis, revealing how researchers 
described the worth of their work 
to their paymasters, and hinting at 
buzzwords, including ‘million’ and 
‘market; that garnered high marks. 
Many funding bodies ask aca- 
demics to plan for the broader 
impacts of their work when they 
apply for grants. But the United 
Kingdom wanted to reward 
impact that had already been 
achieved, says Steven Hill, head 
of research policy at the Higher 
Education Funding Council for 
England (HEFCE). The country 
already has an audit culture: it 
grades the quality of university 
research every few years, and 
hands out £2 billion (US$3 billion) 
annually on the basis of that assess- 
ment. For the 2014 audit, known 
as the Research Excellence Frame- 
work, or REF, HEFCE tweaked 
the rules. It added a requirement 


2 


MORE 
ONLINE 


Government 


World(wide) 


Develop(ed/ment) 


POWER WORDS 


In the UK REF assessment, impact case studies denser in terms such as ‘major’ 
and ‘million’ tended to be scored more highly, a text-mining analysis suggests. 


Why the 
US nixed 
a carbon- 
capture 
project 
go.nature. 
com/gqvgyx 
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REF = Research Excellence Framework. A value of +0.5 means that case studies mentioning a word at 
double the average density were correlated with an assessment score 0.5 higher than average. (In the 
assessment, scores ran from 1 to 4.) The words displayed are all high-frequency terms that showed 
statistically significant correlations in multiple disciplines. Eleven of 36 disciplines studied are shown. Case 
studies were analysed in clusters, because individual scores for each study were not released. 

For a larger image and methodology, see go.nature.com/jcvcod. 


@ How light makes moths vulnerable 
to bats go.nature.com/wi2xza 

@ Tapeworms battle it out for 
dominance in host’s guts go.nature.com/ 
uxokp9 

@ Ice ages produce ocean-floor hills 
go.nature.com/olumls 


nature podcast 


that universities send in case 
studies detailing their work’s 
wider impact during 2008-13, 
and announced that 20% of an 
institution’s final grade would be 
based on these contributions (see 
Nature http://doi.org/zx8; 2014). 

Meeting that challenge was a 
massive effort that sometimes 
involved hiring specialist writ- 
ers and consultants. University 
College London alone wrote 300 
case studies that took around 
15 person-years of work, and hired 
four full-time staff members to 
help, says David Price, the univer- 
sity’s vice-provost for research. 

The results have impressed. 
“Every government wants to know 
the societal impact of its research,” 
says Diana Hicks, who studies 
science and technology policy at 
the Georgia Institute of Technol- 
ogy in Atlanta. “The difficulty is 
how to do that broadly when you 
only have isolated case studies. 
Britain has cracked that problem 
and produced a wonderful data 
source.” 

The case-study narratives dem- 
onstrate “extraordinary breadth 
and depth’, says Jonathan Grant, a 
public-policy researcher at King’s 
College London. They range from 
chemists who used nanoparticles 
to prevent bacteria damaging the 
wood ofa sunken sixteenth-cen- 
tury warship to economists who 
tested the effects of cash transfers 
to poor households in Mexico and 
Colombia. 


NATURE PODCAST 


Light-speed 
trading; 
sequencing 
Darwin’s finches; 
and ancient Arabic 
optics. nature.com/ 
nature/podcast 


SOURCE: PAUL GINSPARG/HEFCE 


To draw further insights, Nature asked Paul 
Ginsparg, a physicist at Cornell University in 
Ithaca, New York, who has experience in text- 
mining, to run a statistical analysis of the lan- 
guage used in the case studies. 

A straightforward word count revealed, 
unsurprisingly, that the terms ‘research, and 
‘impact’ were the most common, with 200,000 
and 135,000 appearances respectively, after 
words such as ‘the’ or ‘and’ are removed. 
‘Development; ‘policy’ and ‘health’ also topped 
the lists. Notably, the documents name-check 
more than 190 countries, suggesting that the 
research has huge geographical reach. 

Ginsparg also looked for statistically 
significant correlations between the use of 
certain words and the scores awarded. He 
found that across the disciplines, texts dense in 
words such as ‘million; ‘market, ‘government, 
‘major and ‘global’ tended to be given high 
scores by the judges, who were told to mark on 
the basis of ‘significance’ and ‘reach’ — whereas 
over-use of terms such as ‘conference, ‘univer- 
sity, ‘academic and ‘project’ correlated with 
lower grades (see ‘Power words’). 

Although the correlations do not indicate 
causation, they might hint at judges’ preference 
for narratives of economic impact in particu- 
lar, speculates Gemma Derrick, a researcher at 
Brunel University London who is examining 
how the studies were collected and assessed. 


“I was sceptical about the ‘impact’ process, 
but now I think it’s a good thing,” says Price, 
who says it has revealed persuasive stories that 
the university can present to funders, industry 
partners, governments and alumni. 

Some UK academics question whether the 
impact component to the research assess- 
ment will make a significant difference to how 
regional funders distribute their cash — and if 
not, whether it was worth adding. The formula 
that will link performance on the assessment 

to funding allocation 


“Every will not be released 
government until March, but it 
wants to know is already clear that 
the societal universities that 
impact of its have traditionally 


excelled in the audit 
of academic output 
— Oxford, Cambridge and Imperial College 
London — also score highly on impact. 
Internationally, some researchers criticize 
the idea of identifying research impact using 
case studies, rather than by tracking more 
quantifiable economic measures. “I am baffled 
why ascientific community would go through 
sucha burdensome and artisanal system,’ says 
Julia Lane, an economist at the American Insti- 
tutes for Research in Washington DC and for- 
mer director of a US government programme 
called STAR METRICS, which monitors the 


research.” 


IN FOCUS | NEWS 


economic benefits of money spent on research, 
including job creation, patents and spin-out 
companies. On 27 January, a network of Euro- 
pean researchers — mainly economists — met 
in Brussels for the first formal meeting of an 
effort to trace how science funding in Europe 
leads to wealth and employment across soci- 
ety. The effort is strongly influenced by STAR 
METRICS. 

Whether anyone will repeat the United 
Kingdom's impact assessment remains an open 
question. “We know lots of other countries are 
interested in learning from our experience,” 
says Hill. Across the world, most countries that 
have introduced nationwide assessments of 
research quality, such as Australia and Italy, do 
not measure impact. Yet governments in both 
Sweden and the Czech Republic are currently 
considering an exercise similar to the REF. 

Back in the United Kingdom, researchers 
are already preparing for the next perfor- 
mance audit, in 2020, with mixed feelings. 
“We will all be encouraged now to do more 
research that could form a case study — 
whether you think this is a good thing or not 
depends on your subject area,” says Dorothy 
Bishop, a neuropsychologist at the Univer- 
sity of Oxford. “I have a concern that I may 
be stuck spending more time evaluating the 
impact of what I do and this will take me away 
from actually doing it.”m SEE EDITORIAL P.137 
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mechanics of the human eye, light lies at the 
heart of phenomena that have fascinated 
scientists for millennia. Today, the latest optical 
technologies — from lasers to solar cells — harness 
light to advance physics and to serve society's needs. 

To put light itself in the spotlight, the United 
Nations designated 2015 the International Year 
of Light and Light-based Technologies. The cel- 
ebration is also pegged to a string of anniversaries: 
Augustin-Jean Fresnel’s proposal in 1815 that light 
is a wave; James Clerk Maxwell's 1865 electromag- 
netic theory; Albert Einstein's 1915 general theory 
of relativity; and in 1965, discovery of the cosmic 
microwave background (CMB) radiation and the 
development of optical fibres for communication. 

Nature is paying its own tribute to light in this 
special issue. Contorting light is the goal of three 
physicists profiled in a News Feature on page 154: 
Miles Padgett twists laser beams to encode binary 
information; Pierre Berini reshapes light waves to 
speed up digital communications; and Margaret 
Murnane dissects X-rays into ultrafast attosecond 
pulses, one billionth ofa billionth ofa second long, 
to probe materials in exquisite details. 

Some advances in the physics of light are of 
great benefit to biology and medicine. Borrowing 
from astronomers, biophysicists are developing 
techniques for seeing through opaque layers, by 


| Sees glorious rainbows to the intricate 


Scientists are pushing the properties of light to new 
extremes. A special issue explores these frontiers. 


detecting the minute glow of visible light scattered 
through body tissues. Such methods are likely 
to lead to more-powerful medical imaging, as 
explained on page 158. 

In another sphere entirely, near-speed-of-light 
communications are set to transform financial 
trading as laser links between banking centres 
come online. But there are major risks, Mark 
Buchanan explains on page 161. Trading stocks 
in milliseconds pushes algorithms to their lim- 
its, exposing flaws that can escalate in seconds to 
cause hundred-million-dollar losses. 

In a News & Views Forum on page 170, two 
cosmologists reflect on the clues to the origin of 
the Universe hidden in its oldest light, the CMB. 
And on page 164, physicist Jim Al-Khalili is daz- 
zled by the afterglow of a 1,000-year-old treatise 
on the nature of light: Ibn al-Haytham’s Book of 
Optics. An online collection will highlight key 
papers on light from journals across Nature 
Publishing Group throughout the year (see 
nature.com/yearoflight). 

With so many facets, scientists’ fascination with 
light looks unlikely to fade. m 


LIGHT 


A Nature special issue 
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| SHAPING LIGHT | 


LEADING [ieee 


hysicist Miles Padgett starts 


to describe the concept of 
twisted light by taking down 
a rainbow-coloured spiral that 
hangs from the ceiling of his 


office at the University of Glasgow, UK. Then 
he stops and scours the room for more props: 
dinner plates, paper, pencils and even leftover 


Shape ity Squeeze it, energize it Christmas chocolates. 
or tieitinto knots. Scientists are Light is made of oscillating electric and mag- 


netic fields, he explains. In a conventional laser 
taking li ght tonew extremes. beam, the oscillations are always in step, with 
the peaks and troughs lined up from one side 
of the beam to the other. (Padgett illustrates the 
| BY ELIZABETH GIBNEY | flat, planar waves with a stack of dinner plates 
that he moves face-forward.) 

But things get more interesting when parts of 
the beam fall out of step. This is where Padgett 
points to the spiral: the peaks of the wavefront 
can be manipulated to the point at which they 
curl around the beam’s direction of motion in 
acorkscrew. This is twisted light, says Padgett, 
who has spent two decades learning to exploit 
its unique properties. 

He has pioneered applications that range 
from moving cells without physically touch- 
ing them to packing lots of information into 
an optical signal — and even tying light in 
knots. In the process, he has developed a 
rare instinct for the subject, say collaborators 
and colleagues. “Many other scientists might 
need to doa calculation, run a model or do an 
experiment before they can get an idea about 
how light should behave,’ says Mark Dennis, a 
theoretical physicist at the University of Bris- 
tol, UK. “One of Miles’s great talents is having 
this knack at being able to anticipate what the 
results should be” 

Props are not the only thing in Padgett’s office. 
It houses the labs coffee machine, and doubles 
as its kitchen and common room — complete 
with sink. Padgett is a fan of productive chance 
encounters, and likes to keep the place buzzing 
with people picking each other’s brains. 

It was a chance encounter that led him to 
twisted light in the first place. In 1994, as a 
research fellow at the University of St Andrews, 
UK, he had dinner with physicist Les Allen 
intending to discuss laser technology. But the 
conversation turned to Allen’s experiments 
with twisted light’. Allen, then at the Univer- 
sity of Essex in Colchester, UK, baited Padgett 
by saying that he knew how to give the light its 
twist using the stem of his wine glass as a lens. 
This strange idea had Padgett hooked. By 1997, 


LIGHT 


A Nature special issue 
nature.com/light2015 
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he and his colleagues had not only learned how 
to make twisted light for themselves, but had 
also devised a way for it to function as an ‘opti- 
cal spanner to trap cells and other microscopic 
particles, then rotate them into any position’. 

Turning light into a spanner is really about 
shaping it, Padgett says. A very simple example 
of shaping is a digital projector, which creates a 
changing image by altering a beam’s intensity 
pixel by pixel. A more sophisticated example 
is a liquid-crystal device that does nothing to 
the intensity of the light passing through each 
pixel, but instead shifts its ‘phase’ — the rela- 
tive position of the wave's peaks and troughs. 
In the stacked-dinner-plate analogy, the plates 
collectively warp and bend. 

Getting to twisted light is a matter of taking 
that warping to extremes — so that the wave- 
fronts form a spiral. That twist means that the 
beam not only exerts radiation pressure on the 
objects it encounters, nudging them forward, 
but also tries to rotate them. “Tt’s just like turn- 
ing and pushing a door knob to open a door,’ 
says Padgett. The optical spanner passes this 
momentum to microscopic objects to trap, 
rotate and move them. Using such devices, 
biologists can bump beads into cells to meas- 
ure the cells’ stiffness, and engineers can create 
unique nanoscale materials. 

Twisted light also provides a new way 
to encode information. The conventional 
approach to doing this with light is to encode 
each bit as a single photon spinning either 
clockwise or anticlockwise around its direction 
of motion. Quantum mechanics allows only 
those two possibilities, so this gives a natural 
way to represent the 1s and 0s of binary code. 

But twisted light has an extra rotational 
quantity known as orbital angular momentum. 
This differs from intrinsic spin in the same way 
as Earth’s yearly motion around the Sun dif- 
fers from its daily rotation on its axis. And it is 


all 


much less constrained by quantum mechanics. 
In theory, says Padgett, twisted light can have 
an infinite number of orbital angular momen- 
tum patterns, or modes, each twisted tighter 
than the last. “This is like having a whole 
alphabet with which to communicate,” he says. 

A decade ago, Padgett was among the first 
to show that each mode can be used to encode 
different information’ — suchas shades of grey 
or numbers — which allows much more data 
to be carried by the same optical signal than 
is possible with just spin encoding. Last year, 
a team at the University of Vienna encoded 
grey-scale images of Wolfgang Amadeus 
Mozart and other famous Austrians using 
16 twisted modes, and successfully sent the 
images through 3 kilometres 
of air* (see Nature http://doi. 
org/ztt; 2014). By using extra 


“This is like having 
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| SQUEEZING LIGHT | 


Pierre Berini 


ierre Berini knows a bargain when he sees one; the evi- 
dence is in his lab, which is cluttered with lasers, oscillators 
and other components that he bought at auctions after 
local companies folded. The University of Ottawa physi- 
cist often buys in batches, after spotting essential items ina 
job lot that otherwise looks like junk. “There are lots of surprises,” he says. 

Berini has a certain sympathy for the failed companies. He is a leader in 
plasmonics, a way of manipulating electrons with light that could be used 
to transmit information in super-fast computers. Months after launching 
a venture-backed firm, Spectalis, to market plasmonics circuitry to the 
communications industry in the early 2000s, he began to feel the effects of 
the dot-com bubble bursting. He ended up hosting an auction ofhis own 
and closing shop. Unperturbed, he plans to try again this year, launching 
a company to apply his technology to tiny sensors in handheld devices 
that detect diseases rapidly and with extreme sensitivity. 

The devices use a peculiar kind of light that emerges from waves > 


a decade to create the complex mathematical 
recipe of overlapping beams needed to make 
an isolated, pretzel-like knot. Only with the 
recipe in hand could Padgett’s team use its light- 
shaping skills to make the abstract mathematics 
a physical reality. 

Padgett believes that the best way for a person 
to succeed is for them to find something they are 
good at, and then to apply it everywhere. “Our 
team can shape light beams,” he says. “So we use 
shaped light in communications, microscopy, 
in imaging, in sensors. We always ask, how can 
we apply what we know to areas that others are 
interested in?” He is using that philosophy in 
his latest project: leading the Quantum Imaging 
Hub, a collaboration between 6 universities and 
30 companies, which is one of 
the 4 Quantum Technology 
Hubs launched last Novem- 


channels of information, such a whole alphabet ber by the UK government. 
techniques could increase with which to His group is creating infrared 
the data-carrying capacity of ° ” cameras that use a single-pixel 
fibre-optic cables and radio communicate. detector rather than the mil- 


waves. 

Padgett has found even more imaginative 
ways to play with twisted light. When a beam of 
it illuminates a wall, for example, the spot will 
have a dark centre. That is because a spinning 
beam of light has a vortex in the middle where 
intensity is zero. Look closely at a spot of laser 
light, says Padgett, and it seems to be riddled 
with such dark spots, known as speckles. If you 
could trace these spots back through the laser 
beam, they would form continuous lines of zero 
intensity twining in three dimensions”. “These 
can be like cooked spaghetti, or you can form 
them into spaghetti hoops or even chain mail,” 
says Padgett. (He points to a poster on his wall 
showing what that looks like: its title is Speckle- 
ghetti:) In 2010, he and his collaborators showed 
how to form the lines into knots*. It took theorist 
Mark Dennis at the University of Bristol, UK, 


lions of expensive pixels in a 
conventional camera. By projecting masks of 
black and white squares onto an object, flicker- 
ing 20,000 times a second, the team can measure 
howincoming intensity varies, and reconstruct 
a picture’. “It’s a convoluted, but much cheaper, 
way of doing the job,’ says Matthew Edgar, a 
physicist in Padgett’s lab. With image-compres- 
sion techniques and boosted computer power, 
the team hopes to extend the technique to video, 
allowing infrared cameras to spot gas leaks or 
see through smoke. 

Back in his office and packing up to head into 
the Glasgow rain, Padgett reflects on what he 
loves about light. It is not its endless uses. Instead, 
he says, the beauty of light is that the more deeply 
you understand it, the more straightforward it 
gets. “Iflight ever surprises me, it’s not in its com- 
plexity, but that it is so simple,’ he says. m 
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of electrons propagating across a metal surface in contact with an 
insulator, such as air or glass. When excited with a laser, these charges, or 
plasmons, generate fluctuating electric and magnetic fields that flow just 
above the metal surface. Trapped at the interface, the waves can be fun- 
nelled into structures that confine their wavelengths to a few tens of nano- 
metres — as little as one-tenth of the laser’s wavelength. The squeezed 
waves travel more slowly than laser light, so can retain the same frequency. 
Berini backed into studying plasmonics while looking for ways to 
improve normal electrical components and photo detectors in the late 
1990s. Light travels much faster than electrical signals, so using it to 
connect silicon chips would massively speed up calculations. But light 
is limited by its wavelength: although electronic devices can be shrunk to 
a few tens of nanometres, the infrared light used in telecommunications 


| FAST LIGHT | 
Margaret 
Murnane 


hen Margaret Murnane was 
growing up in rural County 
y \ Limerick, Ireland, in the 1960s, 
she had no talent for activities 
considered suitable for girls, 
such as sewing or art, and never thought of 
herself as being good with her hands. What she 
did enjoy was going on long walks with her 
father, and gazing at rain-drenched Ireland’s 
multitude of rainbows — an activity that led 
to a lifelong fascination with light. In follow- 
ing that passion, she says, “it turned out I have 
a talent I never knew for aligning lasers. But in 
normal life, how would you ever know?” 
Murnane’s life is now that of a physicist at 
JILA in Boulder, Colorado, a joint institute 
between the University of Colorado and the US 
National Institute of Standards and Technology. 
There, with husband Henry Kapteyn, she runs 
a lab that is leading development of an X-ray 
laser that strobes in attosecond pulses, each 
blast lasting just one-billionth of one-billionth 
ofa second — almost the same proportion ofa 
second as that second is of the entire age of the 
Universe. Such ultrafast X-rays, which have tiny 
wavelengths and high energies, are often used 
to penetrate deep into atoms and image them 
at the nanometre scale. Usually, this happens 
at billion-dollar facilities that generate X-rays 
by accelerating electrons to near light speeds, 
such as the SLAC Linac Coherent Light Source 
in Menlo Park, California. By contrast, Mur- 
nane’s set-up fits on a dining-room table. It 
allows scientists to watch the movement of elec- 
trons around atoms, probing chemical bonds or 


cannot focus to spots much smaller than a micrometre. “It’s a fundamental 
incompatibility,’ says Berini. The smaller wavelengths available with plas- 
mons looked promising, but plasmonic light does not always behave. The 
waves, created by the movement of electrons, decay quickly as a result of 
resistance in the metal, and they travel only micrometres. 

Berini used tools that can craft nanoscale structures, which were 
becoming cheaper and more readily available, to create the first plasmonic 
waves that could travel for centimetres (ref. 8). His lab made whole cir- 
cuits, guiding plasmons down metal strips less than 30 nanometres thick. 

Butallowing the waves to travel farther increases the light’s wavelength. 
Although plasmonic waves are still smaller than conventional light waves, 
the compromise lessened their advantage and Berini found it tough to 
crack the telecommunications industry, where each component in use 


studying spins in a magnetic hard drive. 
Murnane’s background — a childhood spent 
without central heating or indoor plumbing, 
but with a love of knowledge and learning — 
lies behind much of her drive, says Kapteyn. 
“She worked her way up,” he says. Murnane 
met Kapteyn as a graduate student at the Uni- 
versity of California, Berkeley, and the two 
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have worked together ever since — forming 
a stable partnership that Murnane believes 
underlies their scientific success. “It helps to 
have someone who will challenge you hard. 
Those relationships are good for science, but 
difficult for individuals to learn,” she says. 
Together they tackled a problem that they 
first attempted in graduate school — how to 


2015 
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generate laser-like light beams at high energies. 
Rather than accelerating electrons, as huge 
facilities do, their strategy was to combine 
many visible-light photons into a handful of 
higher-energy X-ray photons. The process 
has an analogy with sound. In stringed instru- 
ments, plucking a string gently generates a 
single tone. “If you pluck it harder and harder, 


GLENN ASAKAWA/ 
UNIV. COLORADO 


had been honed over decades. So he and others have been busy developing 
other techniques to deal with the short range of plasmonic light, either 
branching out into applications that turn the loss into an advantage, such 
as photodetectors, or by using nanostructures to amplify the waves. Physi- 
cists are now developing an assortment of nanoshapes — stars, rods and 
crescents — in a range of materials that could harness these waves for 
applications such as capturing solar energy, killing cancer cells and creat- 
ing chip-integrated lasers, known as spasers. 

Henry Schriemer, a physicist at the University of Ottawa, calls Berinia 
“quintessential experimentalist with a deep appreciation for the theory”. 
But Berini says that it is applications that turn his lab on; he attributes 
this entrepreneurial bent to his parents, who ran their own businesses in 
Timmins, the Ontario mining and logging community where he grew up. 


FEATURE | NEWS 


Today, Berini is recycling the efforts made in long-range circuits to 
make a detector for dengue fever. The device, a handheld biosensor 
developed last year with researchers at the University of Malaya in Kuala 
Lumpur, sends plasmon waves down a chip scattered with dengue virus 
particles. A blood sample is placed on the chip; ifthe donor has the infec- 
tion, the sample will contain antibodies that bind to the virus, disrupting 
the wave and producing a signal’. Berini says that the sensors could speed 
up diagnosis, which normally involves sending samples away to a lab. 

A new company is now in the works to commercialize a range of simi- 
lar biosensors. But Berini believes that the application is just one of many 
that squeezed light will have in the future. “With plasmonics, there is a 
lot of new physics to be uncovered,” he says. All of which means that 
some of the random equipment that litters the lab might find a new use. = 


higher harmonics emerge,” says Murnane, 
each at larger integer multiples of the original 
frequency. When ultrashort-pulse lasers were 
developed in the 1990s, Murnane and Kapteyn 
realized that they might be able to use them to 
‘pluck’ an electron violently — accelerating it 
away from and back towards an atom of helium 
—and thereby generate harmonics in the form 


of higher-energy photons. The team succeeded 
in making bright ultraviolet beams’, but it was 
more difficult to increase the energy while 
keeping the beam laser-like, with the waves 
emerging in synchrony. 

Murnane often says that she picked physics 
“because it was the hardest subject” at uni- 
versity — an attitude that stood her in good 
stead with this challenge, which took 15 years 
to solve. The solution was to engage in what 
she calls “a very different way of thinking’, 
and start not with visible-light 
lasers, but with longer-wave- 
length infrared lasers. The 


“She is really 


and Andrius Baltuska at the Vienna University 
of Technology, is still working on refining the 
desktop set-up to make it even faster, more 
energetic and smaller. That would allow them 
to probe even quicker processes, deeper within 
materials and with higher resolution. “We're 
pretty optimistic we can do it,” says Murnane. 
After visible lasers were invented in 1960, 
they underwent rapid development; the same 
revolution is now happening for tabletop X-ray 
sources. Other labs around the world have 
developed similar approaches, 
says Olga Smirnova, a theorist 
at the Max Born Institute. But 


photons had much less energy able to pus h the what makes the JILA technique 
than before. But they resonated envelope of what _ stand out is the ability to pro- 
much more strongly with the : duce such high-frequency light, 
electrons in the helium atoms Bp ossible, A al with such efficiency. And then 
— in effect, giving the string a after year. there is Murnane herself, says 


much stronger pluck — which 
allowed the team to combine more than 
5,000 laser photons into a single X-ray photon. 
Theorists believed that the technique would 
be too inefficient to make usable beams. But 
by carefully tuning the helium gas so that the 
laser and X-rays travelled at the same speed, 
Murnane’s team predicted, then proved, that 
the X-rays would emerge in step, as a bright 
beam". “What was amazing was not just that 
they got the X-rays, but that they got plenty of 
them,” says Mikhail Ivanov, a physicist at the 
Max Born Institute for Nonlinear Optics and 
Short Pulse Spectroscopy in Berlin. 
Murnane and Kapteyn have now made 
ultrafast lasers that produce X-rays of up to 
1,000 electronvolts in energy, and in attosecond 
pulses. Although these devices do not reach the 
energies or brightness attained at the big free- 
electron laser facilities, they come close. And, at 
US$1 million, they are around one-thousandth 
of the price. The lab at JILA has eight such 
lasers, and discoveries in the nano-world are 
starting to trickle in. Murnane both builds and 
uses the lasers — processing the X-ray scatter 
patterns to capture images of charge and spin 
flows within materials. One counter-intuitive 
finding is that nanometre-sized heat sources 
cool quicker when packed closer together’. 
Murnane, together with collaborators includ- 
ing Kapteyn and Tenio Popmintchev at JILA 
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Smirnova: “She is really able 
to push the envelope of what is possible, year 
after year.’ 

Murnane insists that they have not reached 
the limit yet — that higher-energy X-rays and 
even faster, zeptosecond (10°*'s) pulses may be 
possible. “A misconception in science some- 
times is that lasers are now an old technology, 
and there's nothing new to learn,” she says. 
“That’s so far from the truth. m 


Elizabeth Gibney is a reporter for Nature in 
London. 
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SUPER VIS 


USING TECHNIQUES ADAPTED FROM ASTRONOMY, PHYSICISTS ARE FINDING WAYS 
TO SEE THROUGH OPAQUE MATERIALS SUCH AS LIVING TISSUE. 


by Zeeya Merali 


t seemed too good to be true, says Allard Mosk. It was 2007, and 
he was working with Ivo Vellekoop, a student in his group at the 
University of Twente in Enschede, the Netherlands, to shine a beam 
of visible light through a ‘solid wall’ — a glass slide covered with 
white paint — and then focus it on the other side. They did not have a 
particular application in mind. “T really just wanted to try this because 


times brighter than they had hoped for. “This just doesn't happen on the 
first day of your experiment,’ exclaims Mosk. “We thought wed made a 
mistake and there must bea hole in our slide letting the light through!” 

But there was no hole. Instead, their experiment became the first of 
two independent studies'” that were carried out that year pioneering 
ways to see through opaque barriers. So far it is still a laboratory exercise. 
But progress has been rapid. Researchers have 


in truth, the two researchers did not expect to 
pick up much more than a faint blur. 

But as it turned out, their very first attempt’ 
produced a sharp pinprick of light a hundred 


it had never been done before; Mosk says. And 
LIGHT 


A Nature special issue 
nature.com/light2015 


now managed to obtain good-quality images 
through thin tissues such as mouse ears’, and 
are working on ways to go deeper. And if they 
can meet the still-daunting challenges, such as 
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dealing with tissues that move or stretch, potential applications abound. 
Visible-light images obtained from deep within the body might elimi- 
nate the need for intrusive biopsies, for example. Or laser light could be 
focused to treat aneurysms in the brain or target inoperable tumours 
without the need for surgery. 

“Just ten years ago, we couldn't imagine high-resolution imaging down 
to even 1 centimetre in the body with optical light, but now that has now 
become a reality,’ says Lihong Wang, a biomedical engineer at Washing- 
ton University in St. Louis, Missouri. “Call me crazy, but I believe that we 
will eventually be doing whole-body imaging with optical light” 


RICH SOURCE 

Itis already possible to peer inside the body with X-rays and ultrasound. 
But the images produced by such tools are crude compared with those 
that should be possible with visible light. Partly this is because visible- 
light images tend to have higher resolution, says Wang. But it is also 
because optical wavelengths interact strongly with organic molecules, 
so the reflected light is packed with information about biochemical 
changes, cellular anomalies and glucose and oxygen levels in the blood. 

However, those interactions also make visible light prone to scatter- 
ing and absorption. Absorption will scupper any imaging attempt: the 
information the photons pick up is lost as they are absorbed into the 
material. Scattering, however, preserves a ray of hope. Many materi- 
als, such as skin, white paint or fog, are ‘opaque only because photons 
passing through them ricochet until they are thoroughly scrambled. 
But they are not lost — so in principle, the scrambling can be reversed. 

Astronomers have already solved a version of this scattering problem 
using a technology called adaptive optics, which allows them to undo 
the distortions imposed on images of stars, planets and galaxies by the 
scattering of light in the atmosphere (see Nature 517, 430-432; 2015). 
The basic idea is to collect light from a bright reference star and use an 
algorithm to calculate how the atmosphere has smeared and blurred 
its point-like image. The algorithm then controls a special ‘deformable’ 
mirror that cancels out the atmospheric distortions, turns the guide-star 
image into a point, and at the same time brings other distant objects 
into sharp focus. 

Unfortunately, this technique is tough to use in the body. Targets deep 
inside biological tissues do not shine the way that stars do — they have 
to be illuminated from the outside — and the scatterers are much more 
densely packed than those that scatter light in the atmosphere. “Youd 
need the equivalent ofa deformable mirror with billions of moving parts 
to compensate for the scattering caused by an egg shell,” says Ori Katz, 
an optical physicist at the Langevin Institute in Paris. That is why Mosk 
and Vellekoop were not too hopeful of success when they started. Still, 
the pair took heart from the advance of technology. “Until recently it 
had been preposterous to think you could control a million pixels, but, 
by 2007, every smartphone could do it,” says Mosk. 

They therefore made use of a ‘spatial light modulator’: a device simi- 
lar to an LCD smartphone display that can control the transmission of 
different parts of a laser beam by delaying one part relative to another. 
They fired their laser through the modulator towards the painted glass 
slide, placed a detector beyond the slide and used a computer to moni- 
tor how much light the detector picked up. The computer then added 
and subtracted delays at each pixel of the modulator, going through a 
process of trial and error to see what changes minimized the scattering 
of the laser light as it passed through the slide. In effect, it was trying 
to give the incoming light a distortion that the opaque barrier would 
exactly cancel out. Mosk and Vellekoop ran the algorithm for more 
than an hour, and when it was done they had a result that beat all their 
expectations: a focus that was a thousand times more intense than the 
background signal’. 

“The Mosk experiment was an eye-opener, says Katz. “It changed the 
paradigm of what could be done with optical light” 

Soon after his succcess, Mosk learned of similar work being done by 
bioengineer Changhuei Yang and his team at the California Institute of 
Technology in Pasadena. 


FEATURE | NEWS 


These researchers had used a different technique to focus scattered 
optical light, and a different opaque substance: a thin slice of chicken 
breast”. But they, too, were surprised by how easy it was to do. “I had 
thought ‘we'll spend six months on this, and when it doesn't work, we'll 
chalk it up as a learning experience;” says Yang. “But actually it wasn’t 
that hard” 

Soon after the two papers were published, the field exploded as other 
physicists rushed to join in. One of them was optical physicist Jacopo 
Bertolotti, who came to work with Mosk in 2010. Bertolotti, now at the 
University of Exeter, UK, says that he was drawn both by the “beauty 
of the experiment” and by the potential it offered for medical imaging. 
But he could see that that goal was still a long way off. 


"CALL ME CRAZY, BUT I BELIEVE 
THAT WE WILL EVENTUALLY BE 
DOING WHOLE-BODY IMAGING WITH 
OPTICAL LIGHT.” 


The first issue that Bertolotti faced was that Mosk’s original set-up 
required a camera to be placed behind the opaque surface. That is a 
problem for medical applications because placing a camera under the 
skin would involve surgery, which would be invasive, dangerous and 
rarely worth the risk. In 2012, however, Bertolotti, Mosk and their col- 
leagues devised a way to put both the laser light source and the detector 
in front of the surface’. 

Their target was a fluorescent Greek letter m just 50 micrometres 
across hidden behind a thin opaque screen. As such, the target was 
roughly the same size as a cell and analogous with medical techniques 
that involved injecting fluorescent dyes into living tissue to aid in imag- 
ing. When the laser was switched on, the photons would bounce their 
way through the screen and produce a diffuse illumination of the fluo- 
rescent 1. The light reflected from the letter would then make its way 
back through the screen and produce a blurry speckled pattern on the 
other side. It was like trying to see the symbol through a shower curtain. 

Yet the shape of the letter was still encoded in the scattered light. To 
retrieve that shape, the team recorded the speckle pattern, moved the 
laser to shine at a different angle, then recorded the new speckle pat- 
tern’. By repeating this many times and comparing the patterns point 
by point, a computer could work out how the patterns were correlated 
— and from that, work backwards to reconstruct the hidden letter 7. 

That was progress, says Bertolotti, but it still was not good enough. “It 
only works if the object to be imaged is on the other side of the scattering 
medium,’ he says. For many medical applications, such as seeing inside 
the brain, or within a blood vessel, the target is buried within tissue. 


INSIDE OUT 

The challenge of imaging inside the scattering medium has been taken 
up by a number of groups, including Yang’s and Wang's. In 2013, for 
instance, Yang’s team achieved this feat with unprecedented resolution 
by picking out a fluorescent bead just one micrometre across sand- 
wiched between two artificial opaque layers’. 

Yang, together with biologist Benjamin Judkewitz and the rest of his 
team did this by illuminating the medium and letting the light bounce 
its way through to the other side, then reflecting it back with a ‘time- 
reversing mirror, which effectively forces every light ray to exactly 
retrace its steps. Time-reversing all the rays would simply undo all the 
scattering, however. So instead, the team focused an ultrasound beam 
— which is not easily scattered — at one point in the medium, knowing 
that any optical light that happened to pass through that point would 
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LIGHT AND SOUND 


One way to see inside opaque materials is to combine ultrasound with a ‘time-reversing’ 


mirror system, which forces every light ray to exactly retrace its steps. 


Opaque material Ultrasound generator 
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SCRAMBLED LIGHT 


In many opaque materials — 
including living tissue, white paint 
and fog — light is not actually 
absorbed. It simply bounces around 
in the material until it is too 
scrambled to form an image. 


ULTRASOUND FOCUS 


A beam of ultrasound (yellow rings) 
is focused at some point within the 
material. Any light that happens to 
bounce through this point will 
undergo a slight shift in its frequency 
(blue rays). 


undergo a slight shift in frequency. Then on the far side, the researchers 
set up the time-reversing mirror tuned so that it would send back only 
the light that had experienced that frequency shift. The result was a thin, 
time-reversed beam that would automatically pass back through the 
focus and add its energy to the light from the first pass. This turned the 
ultrasound focus into a spot of comparatively high radiation intensity 
— “a torch inside the wall’, says Judkewitz, who is now at the Charité 
University Hospital in Berlin. Better still, the ultrasound focus could be 
moved around within the medium. And when it passed over the bead, 
the bead fluoresced (See ‘Light and sound’). 

However, the technique was still a long way from seeing into deep 
layers of tissue, which pose another, much tougher challenge: they tend 
to move constantly as a result of blood flow and breathing. “We are still 
not so close to medical applications because these techniques tend to 
work only if the scattering medium is perfectly frozen in time,” says 
Mathias Fink, a physicist at Langevin who pioneered a version of the 
time-reversal technique in the 1990s that used ultrasound alone®. Most 
groups have reduced the timing from Mosk’s original hour or so to just 
tens of seconds, says Katz, and that is fine for imaging a bead or a letter 
1, but not for imaging a tumour in the body. 

But last year, a team led by Sylvain Gigan, a physicist at the Kastler 
Brossel Laboratory in Paris, and including Katz and Fink, demonstrated 
a way to reconstruct the image of the hidden object in just one camera 
shot’. “It’s a bit like magic when you see the algorithm converge on the 
final image,’ Gigan says. 

Wang agrees that speed is of the essence. “Everything is in motion and 
we only have a millisecond-scale window to make an image,” he says. 
Ina paper published in January’, Wang and his team managed to get 
the speed down to 5.6 milliseconds, “which is fast enough for selected 
in vivo imaging’, he says. Furthermore, their target was made from ink- 
stained gelatin and sandwiched between the ear of an anaesthetized 
mouse anda ground-glass diffuser. Getting success with a live mouse is 
impressive, says Bertolotti — although he points out that “moving from 
a mouse ear, which is relatively thin, to imaging human skin and flesh 
will still take a lot of extra work”. 

As of today, Bertolotti adds, there is still no imaging approach that 
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TIME REVERSAL 


A time-reversing mirror sends back 
only the frequency-shifted light. When 
the light retraces its steps, it passes 
through the ultrasound focal point 
and adds its energy to the light 
coming through on the first pass. 
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The focal point is scanned through 
the material. When it passes over 
targets labelled with a fluorescent 
dye, the structures emit a detectable 
glow — and researchers can build up 
a map of the interior. 


stands out above the rest. Each has its advantages and disadvantages. 
“Rather than developing one technique that’s good for everything, I 
think we'll develop a suite of techniques that could one day all be com- 
bined into the same piece of apparatus,” he says. “I don’t know how 
quickly that might happen, but this is a young and fast-moving com- 
munity, so it could be within a few years.” 

The techniques now being pioneered by bioengineers and physicists 
for medicine could also be put to a range of other purposes. Mosk, for 
example, believes that these methods could bea tool for art restoration. 
“Most painters build up works in several layers, and the layers below 
can influence the chemical and physical ageing of the painting, so it’s of 
some significance that you know what is in there if you want to preserve 
it” he says. Methods that in effect unscatter light could also help the 
telecommunications industry to unscramble the noise in optical fibres 
that is caused by scattered light. 

Another obvious customer is the military, says Fink, who thinks that 
the technology could be used to allow soldiers to see through a portable 
shield — either a physical screen or a fogging spray — that obscures 
them from their enemy’s view. “It’s not the same as being invisible, but 
it would allow you to see others while not being seen,” he says. 

Almost all the scientists in this young field get excited when they start 
dreaming of applications. But Gigan, for one, is keen to keep the applica- 
tions above board. “When we tell people what we do, someone always 
asks if we'll create a phone app to let people look through shower cur- 
tains,” he says. “This is something that could be done with our technique 
— but we don’t intend to do it.” m 


Zeeya Merali is a freelance writer in London. 
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A software glitch at trading firm Knight Capital caused losses of US$440 million in a single day in 2012. 


Trading at the speed of light 


To minimize risks, we must learn more about how financial markets 
operate at ever faster rates, urges Mark Buchanan. 


inancial traders are in a race to make 
iz transactions ever faster. In today’s high- 
tech exchanges, firms can execute more 
than 100,000 trades in a second for a single 
customer. This summer, London and New 
York’s financial centres will become able to 
communicate 2.6 milliseconds (about 10%) 
faster after the opening of a transatlantic 
fibre-optic line dubbed the Hibernia Express, 
costing US$300 million. As technology 
advances, trading speed is increasingly lim- 
ited only by fundamental physics, and the 
ultimate barrier — the speed of light. 
Through glass optical fibres, information 
travels at two-thirds of the speed of light in 


a vacuum (300,000 kilometres per second). 
To go faster, data must travel through the air. 
The corridors between Chicago and New 
York and New Jersey, and between London 
and Frankfurt, are bristling with efficient 
microwave and millimetre-wave links. An 
even more efficient network of lasers — 
based on military technology for in-flight 
signalling between aeroplanes — has been 
installed to link the New York and New 


LIGHT 


A Nature special issue 
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Jersey as well as the London and Frankfurt 
financial exchanges’. 

Next up may be hollow-core fibre cables, 
through which light would travel in a tiny air 
gap at light speed. Trading firms speculate 
about a fleet of balloons or uncrewed solar- 
powered drones carrying signal repeaters to 
support a network of links across the oceans. 
In a decade or so, firms may even commu- 
nicate using neutrinos, which travel at the 
speed of light and can go through obstacles, 
including Earth. It all spells big profits for 
high-tech trading firms, which now account 
for around 50% of equity trading in the 
United States and in Europe. 
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Laser units on rooftops connect New Jersey’s Nasdaq data centre with the New York Stock Exchange. 


> But some firms claim that uneven access 
to extreme speed erodes trading fairness. 
And system-wide failures occur when algo- 
rithms interact in unforeseen ways — such 
as in the ‘flash crash’ of 6 May 2010, when 
the Dow Jones Industrial Average fell by the 
largest daily amount ever within minutes 
(see ‘Flash crash’). No one knows when a 
similar event might spill over into global 
markets. 

Avoiding these risks will require intensive 
research on how markets work — as complex 
ecologies of interacting algorithms — and 
how countermeasures could avert disasters. 


GETTING AHEAD 

High-frequency trading relies on fast 
computers, algorithms for deciding what 
and when to buy or sell, and live feeds 
of financial data from exchanges. Every 
microsecond of advantage counts. Faster 
data links between exchanges minimize the 
time it takes to make a trade; firms fight over 
whose computer can be placed closest; trad- 
ers jockey to sit closer to the pipe. It all costs 
money — renting fast links costs around 
$10,000 per month. 

Communications technology is a limiting 
factor. Fibre-optic cables carry the most data, 
but do not give the speed required. The fast- 
est links carry information over a geodesic 
arc — the shortest path on Earth’s surface 
between two points. So line-of-sight micro- 
waves are a better option; millimetre waves 
and lasers are better yet, because they have 
higher data densities. 

Open-air communications systems are 
prone to weather disruption. Anova Tech- 
nologies, a network provider for trading 
firms headquartered in Chicago, Illinois, has 
augmented its New York laser network with 


millimetre waves to overcome rain, fog and 
snow. Adaptive alignment mechanisms keep 
the links working even if winds make towers 
twist by up to 3°. But microwaves and lasers 
cannot be used over long distances without 
repeaters. They attenuate quickly in the 
atmosphere and do not curve around Earth. 
Some economists question the worth of 
such investments. Joseph Stiglitz, a Nobel 
laureate in economics, is among those who 
argue that rapid trading is socially useless”. 
High-frequency firms quickly cancel about 
95% of the orders they make*. Worse, speed 
may impede proper market function. The 
traditional purpose of financial markets is 
to pool diverse information from many peo- 
ple to channel investment resources. That 
requires trading based on insight, depth of 
study and patience — all foreign to the high- 
frequency algorithm-based system’. 


GOOD, BAD AND UGLY 

Fast trading has pros and cons. First, it gives 
markets ‘liquidity’ — it makes it easier for 
investors to find trading partners at reason- 
able prices. Liquid markets benefit trade 
in the same way that free-flowing traffic 
helps transport. Such markets tend to have 
low ‘spreads’ — the difference between the 
prices at which one can buy or sell a stock, 
which reflects the fee that dealers demand 
and thus transaction costs for investors. As 
high-frequency trading has grown over the 
past decade, spreads in many markets have 
fallen, making trading cheaper’. 

Even so, the liquidity that computer trad- 
ing creates is fleeting, and it can fail when 
markets get unruly. Wildly fluctuating prices 
mean bigger risks for traders who earn a liv- 
ing by ‘market making’ — standing ready to 
buy or sell stocks at any moment and earning 
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a profit from the spread. The algorithms they 
use to trade profitably make more errors and 
are programmed to get out of the market 
altogether when markets get too volatile. 
The problem is exacerbated by the similar- 
ity of the algorithms used by many high-fre- 
quency trading firms — they all bail out at 
the same time. That is what happened in the 
2010 flash crash. (Of course, this problem 
happens with human traders too, who flee 
markets when they get too scary.) 

Another good thing about high-frequency 
trading is that it helps to synchronize prices 
across markets’. It takes time to digest infor- 
mation, draw out implications and align 
prices. If prices in sugar or high-fructose 
corn syrup rose, stocks in Coca Cola would 
fall quickly; those of less well-known soft- 
drink companies would take longer. High- 
frequency trade speeds up that process. In 
2000, it took minutes on average fora price 
change in one security to flow to others. Now 
it takes less than ten seconds. Not everyone 
likes this: fast synchronization wipes out 
profit opportunities for firms that make 
money by knowing about the momentary 
price imbalances. 


MARKET DYNAMICS 

Some high-frequency firms exploit an anach- 
ronism in the structure of markets. By US law, 
each regulated exchange must feed its best 
available prices for a stock, sale and purchase, 
to a central facility, which uses that informa- 
tion to establish a public National Best Bid 
and Offer (NBBO). But exchanges also sell 
faster proprietary data feeds that firms can 
use to predict the NBBO in advance, gain- 
ing an edge over anyone using the pubic 
information alone. Hence, high-frequency 
firms can move in ahead of slower traders. 
This tends to further synchronize prices. Big 
investors such as mutual funds and pension 


FLASH CRASH 


On 6 May 2010, the market value of the Dow 
Jones Industrial Average index fell by 9%, 
but recovered in minutes. High-speed trading 
algorithms were in part to blame. 
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funds, which act on real-world insight and 
information with a long-term view, are 
among those which lose out, although they 
also benefit from the lower spreads created 
by high-frequency traders. 

In the United States, some large trading 
firms have set up private trading spaces 
to eliminate the timing edge for high- 
frequency traders. For example, the alterna- 
tive trading system IEX, launched in 2013, 
aims to stop exploitation of the NBBO. It 
has introduced a trading ‘speed bump’ — 
an automatic delay of 350 microseconds 
— which makes it impossible for traders 
to benefit from the faster feeds. IEX has 
already attracted about 1% of stock-trading 
volume in the United States. Firms in other 
countries may follow suit. 

With computer codes carrying out trades 
with real-world consequences at a rate 
beyond that at which humans can intervene, 
the impacts of coding errors and digital 
glitches can spiral quickly. In 2012, a flaw in 
the algorithms of one of the largest US high- 
frequency trading firms, Knight Capital, 
caused losses of $440 million in 45 minutes as 
its system bought at higher prices than it sold. 

Sudden spikes or ‘fractures’ in the prices 
of stocks are increasingly common. Tens 
of thousands of times in the past few years, 
stock values have changed by 1% in less 
than 0.04 of a second®. The flash crash of 
2010 happened at around 2.45 p.m. New 
York time, and markets recovered in about 
15 minutes. Had it struck just before clos- 
ing time in New York, the shock would have 
affected markets worldwide and recovery 
would have taken longer. Some investors 
speculate about a ‘splash crash; in which 
a massive spike in one market disrupts or 
freezes trade in foreign exchange, futures, 
commodities, bonds and other assets, poten- 
tially triggering a global economic crisis. 

Some researchers® suggest that the spikes 
reflect a fundamental transformation of 
market dynamics, linked to the necessity for 
firms to use simple algorithms to maximize 
running speed. 


SYSTEMIC RISKS 

The nature of financial markets today is 
vastly different from that in the past. Rather 
than reflecting the collective decisions of 
people, they belie the behaviour of complex 
webs of technologies and their interactions 
with humans. The potential for global prob- 
lems is increasing as high-frequency trad- 
ing has moved into international markets 
for futures and other assets’. No industry — 
including energy and food, insurance and 
banking — is immune from disruption. 

In future, when airborne laser networks 
span the oceans, things may get even stran- 
ger. The location at which traders get the 
earliest possible information from two 
exchanges lies at their mid-point — between 


FAST TRADING HOTSPOTS 


The speed of light is the ultimate limit to how rapidly trades can be made between financial centres 
(@) — it would take signals travelling at this speed 67 milliseconds to travel halfway around the Earth. 
The midpoints between exchanges (e) are the best places to site high-frequency trading computers 
because they access information from both simultaneously and with the minimum delay. 


Chicago and London, this is in the middle 
of the Atlantic Ocean. At such a site, traders 
could exploit a technique called ‘relativistic 
arbitrage’* to profit from momentary imbal- 
ances in prices in Chicago and London. 

To explain: special relativity says that 


nothing can travel faster than the speed of 


light, c. Hence, a trader standing a distance 
D away from an exchange can find out 
what happened there, in the best circum- 
stance, at a time T'=D/c after it happened. 
Between major trading centres around the 
globe, such delays can be from a few to tens 
of milliseconds. If a trader stands halfway 
between the two exchanges, he or she will 
receive information from both after the same 
interval, T= D/c. Anywhere else, the distance 
to at least one of the exchanges would be 
greater and information would take longer 

to get there. 
In other words, within a few years it 
may become profitable to station a ship or 
other trading plat- 


“Theimpacts form near halfway 
of coding points between pairs 
errors and of financial centres 
digital worldwide (see ‘Fast- 
glitches trading hotspots’). 
can spiral That said, the profits 
quickly.” earned by high- 


frequency firms have 
fallen in recent years, suggesting that most 
of the easy opportunities for money-making 
have already been taken. 

If in ten years the wheels of the global 
financial system really will be greased by 
firms signalling from New York to Mel- 
bourne at Einstein’s speed limit, research 
and policy-making should focus on two 
questions. First, how to avoid the biggest 
things that can go wrong; and second, how 
to make markets work as well as they can to 
serve society. 

The first challenge requires more research 


into the dynamics of markets that are run by 
algorithms rather than investors. Computer 
scientists, mathematicians and economists 
need to work together to understand what 
drives flash crashes and how changes in 
market structures might avoid them. What 
‘circuit breakers, so to speak, might keep 
events from running out of control? 
Second, researchers and policy-makers 
need to assess how to regulate markets to 
make them serve the purpose of boosting 
real economic investment. Algorithmic 
trading has been given wide latitude for the 
past two decades, under the assumption that 
firms making a profit must be helping the 
market. Finance research’ suggests that there 
may be an optimal speed for trading that 
today’s markets have already far surpassed. m 


Mark Buchanan is a science writer based 
in the United Kingdom. His latest book is 
Forecast: What Physics, Meteorology and 
the Natural Sciences Can Teach Us About 
Economics. 

e-mail: buchanan.mark@gmail.com 
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IN RETROSPECT 


Book of Optics 


Jim Al-Khalili revisits Ibn al- Haytham’s hugely influential study on its millennium. 


he greatest physicist of the medieval era 
| leda life as remarkable as his discover- 
ies were prodigious, spending a decade 
in prison and at one point possibly feigning 
mental illness to get out of a tight spot. Abu 
Ali al-Hassan ibn al-Haytham (Latinized to 
Alhazen) was born in Basra, now in southern 
Iraq, in AD 965. His greatest and most famous 
work, the seven-volume Book of Optics (Kitab 
al-Manathir) hugely influenced thinking 
across disciplines from the theory of visual 
perception to the nature of perspective in 
medieval art, in both the East and the West, 
for more than 600 years. Many later European 
scholars and fellow polymaths, from Robert 
Grosseteste and Leonardo da Vinci to Galileo 
Galilei, René Descartes, Johannes Kepler and 
Isaac Newton, were in his debt. Indeed, the 
influence of Ibn al-Haytham’s Optics ranks 
alongside that of Newton's work of the same 
title, published 700 years later. 

Interest in optics began in antiquity. The 
Babylonians, Egyptians and Assyrians all used 
polished quartz lenses. The basic principles 
of geometric optics were laid down by Plato 
and Euclid. They included ideas such as the 
propagation of light in straight lines, and sim- 
ple laws of reflection from plain mirrors. The 
earliest serious contribution from the Islamic 
world came from ninth-century Arab scholar 
Yaqub ibn Ishaq al-Kindi. 

As a young man, Ibn al-Haytham received 
an excellent education and was widely noted 
as a mathematical and scientific prodigy. 
Frustrated by his administrative duties work- 
ing in a government post in the vast Islamic 
Empire — which at the time stretched from 
India to Spain — he was sacked owing to real 
or, as some speculate, faked mental illness. 

Sometime during the first decade of the 
new millennium, he proposed an ambitious 
project to dam the Nile. He was invited to 
Egypt by the Fatimid caliph al-Hakim biamr 
Illah. However, on seeing the scale of the 
task, Ibn al-Haytham quickly realized that it 
was beyond him. He was promptly impris- 
oned in Cairo for wasting the caliph’s time. 

Far from cowing him, the decade of 
imprisonment granted Ibn al-Haytham the 
seclusion to think and write, particularly on 
optics. After his release 
around the year 1020, 
he began working at 
a prolific rate, car- 
rying out a series of 
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Ibn al-Haytham’s work long pre-dated Newton’s. 


famous experiments on the nature of light. 
For example, using a camera obscura, he 
proved that light travels in straight lines; he 
also mathematized the fields of catoptrics 
(reflection of light by mirrors) and dioptrics 
(refraction of light through lenses). This huge 
body of experiment and theory culminated 
in his Book of Optics. 

This treatise can be regarded as a science 
textbook. In it, Ibn al-Haytham gives 
detailed descriptions of his experiments, 
such as exploring how light rays are reflected 
off plain and curved surfaces. He includes 
the apparatus he used, the way he set it up, 
the measurements and his results. He then 
uses these observations to justify his theo- 
ries, which he develops with geometrical 
models. He even urges others to repeat 
his experiments to verify his conclusions. 
Many historians of science consider Ibn 
al-Haytham to be the first true proponent of 
the modern scientific method. 

The work can be roughly divided into 
Books I, Hand II, devoted to the theory of 
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vision and the associated physiology of the 
eye and the psychology of perception; and 
Books IV to VIL, covering traditional physi- 
cal optics. The work’s most celebrated contri- 
bution to science is its explanation of vision. 

At that time, scholars’ understanding of 
the phenomenon was a mess. The Greeks 
had several theories. In the fifth century Bc, 
Empedocles had argued that a special light 
shone out of the eye until it hit an object, 
thereby making it visible. This became 
known as the emission theory of vision. It 
was ‘refined’ by Plato, who explained that 
you also need external light to see. Plato’s stu- 
dent Aristotle suggested that rather than the 
eye emitting light, objects would ‘perturb’ 
the air between them and the eye, triggering 
sight. Other philosophers around this time, 
including Epicurus, attempted a form of 
‘intromission theory of vision (light enter- 
ing the eye from outside), but it was Plato's 
theory that was given a mathematical basis 
by Euclid, who described light rays emerg- 
ing ina cone from the eye. Several centuries 
later, Ptolemy expanded on this idea. 

Early Islamic scholars such as al-Kindi and 
Hunayn ibn Ishaq favoured a combined emis- 
sion-intromission theory. They posited that 
the eye sends out light to the observed object, 
which then reflects the light back into the eye. 

It took the genius of Ibn al-Haytham to 
finally resolve the issue. He argued that if we 
see because rays of light are emitted from the 
eye onto an object (Plato and Euclid’s ‘sight 
rays’), then either the object sends backa sig- 
nal to the eye or it does not. Ifit does not, how 
can the eye perceive what its rays have fallen 
on? Light must be coming back to the eye, and 
this is how we see. But if so, what use is there 
for the original rays emitted by the eye? The 
light could come directly from the object if it 
is luminous or, if it is not, could be reflected 
from the object after being emitted by another 
source. Rays from the eye, decided Ibn al- 
Haytham, are an unnecessary complication. 

He also went further than anyone before 
in trying to understand the underlying phys- 
ics of refraction. He argued that the speed of 
light was finite and varied in different media, 
and he used the idea of resolving the path 
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of a light ray into its vertical and horizontal 
components of velocities. He carried out all 
his work geometrically, and introduced many 
new ideas, such as the study of how the atmos- 

phere refracts light from celestial bodies. 
Later Islamic scholars, including the 
thirteenth-century Persians Qutb al-Din al- 
Shirazi and Kamal al-Din al-Farisi, extended 
the Optics. Al-Farisi, who wrote The Revi- 
sion of the Optics (Tangih al-Manazir), 
used geometry to arrive at the first correct 
mathematical explanation of the rainbow 
(at the same time as, but independently of, 
the German scholar Theodoric of Freiberg). 
The Book of Optics was first translated 
into Latin in the late twelfth or early thir- 
teenth century, as De Aspectibus. The English 
philosopher and empiricist Roger Bacon 
then wrote a summary of it, as did his Pol- 
ish contemporary Witelo. It was soon being 
cited across Europe. Among the many ideas 
taken up by Ibn al-Haytham’s Latin-reading 
disciples was that pure light was not visible, 
and that its job was simply to allow us to see 
colour. Even Kepler, 


“Many who studied Ibn al- 
historians of Haytham’s work, 
science consider _ thought this; it took 
Ibnal-Haytham — Newtonto describe 
to be the first light as itself being 
true proponent made up of differ- 
of the modern ent colours. (Other 
scientific erroneous ideas 
method.” in Optics include 


a repetition of 
Ptolemy’s mistaken law of refraction, and 
an incorrect understanding of reflection as a 
more intense form of refraction.) 

Ibn al-Haytham’s work decisively influ- 
enced the theory of perspective that flowered 
in Renaissance European science and art. 
De Aspectibus was translated into Italian in 
the fourteenth century, making it accessible 
to practitioners such as the Florentine art 
theorist and architect Leon Battista Alberti, 
author of the 1435 treatise On Painting (Della 
pittura), the sculptor Lorenzo Ghiberti and 
the geometer-artist Piero della Francesca. 
They harnessed Ibn al-Haythams discussions 
on perspective to help to create the illusion 
of three-dimensional depth on canvas and in 
friezes. These revolutionary artists strove to 
understand both the objective world and the 
visual system that determined its appearance. 

Today, as we use laser beams to manipu- 
late atoms, stimulate neurons with light or 
convey information in entangled photons, 
it is worth recalling that the foundations of 
this field were laid down around 1,000 years 
ago by Ibn al-Haytham. = 


Jim Al-Khalili is a professor of physics at the 
University of Surrey in Guildford, UK, and 
the author of Pathfinders: the Golden Age 
of Arabic Science. 

e-mail: j.al-khalili@surrey.ac.uk 
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Books in brief 


Hell and Good Company: The Spanish Civil War and the World it Made 
Richard Rhodes SIMON AND SCHUSTER (2015) 

His 1986 The Making of the Atomic Bomb (Simon and Schuster) is 

a towering chronicle of modernity. Now historian Richard Rhodes 
examines the “little world war”, Spain’s hellish 1936-39 civil 
conflagration (see Nature 494, 34; 2013). As he shows, it was a 
testing ground for medical and technological advances — in blood 
transfusion on one hand, and on the other in airborne warfare, which 
led to the bombing of Guernica immortalized in Pablo Picasso’s 
great painting. Luminaries drawn to the war, he shows, ranged from 
geneticist J. B. S. Haldane to writer George Orwell. 


How to Fly a Horse: The Secret History of Creation, Invention, 

and Discovery 

Kevin Ashton DOUBLEDAY (2015) 

This study of creativity by Kevin Ashton — the technical pioneer behind 
the ‘Internet of Things’ — is a testament to Thomas Edison’s definition 
of genius (1% inspiration, 99% perspiration). Science, Ashton argues, 
is less a thing of “eureka shrieks” than of hard work, small steps and 
understanding of adversity. His case studies compel, from Réunion 
slave Edmond Albius’s 1841 breakthrough in vanilla pollination to 
surgeon Judah Folkman’s “series of repetitive failures” that led to the 
discovery of angiogenesis, now key to cancer treatments. 


Future Arctic: Field Notes from a World on the Edge 

Edward Struzik ISLAND (2015) 

In September 2014, dwindling sea ice forced some 35,000 walruses 
onto the Alaskan coast — just one ecological event in a multitude 
besetting the ‘climate-changed’ Arctic. Journalist and explorer 
Edward Struzik cogently analyses the environmental and policy 
challenges, drawing on research into past extinctions and present 
disruptions such as tar-sand exploitation, military territorialism and 
tundra fires. As he ticks off the costs to indigenous peoples, ocean 
biodiversity, caribou habitat and more, the case for an Arctic treaty 
and serious conservation efforts becomes ever clearer. 
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Words Onscreen: The Fate of Reading in a Digital World 

Naomi S. Baron OXFORD UNIVERSITY PRESS (2015) 

For every digital devotee clutching an e-reader, there is an old-school 
bibliophile brandishing a physical book. But which works best for 
reading comprehension? In this thoughtful study, linguist Naomi 
Baron investigates each platform in the light of recent research, 

and surveys US, Japanese and German reading habits. E-readers, 
she finds, democratize access and offer easy storage, but can also 
discourage tackling more involved texts or rereading, and encourage 
“power browsing” rather than perusal. She recommends allowing 
room for both options — letting “form follow function”. 


What Nature Does For Britain 

Tony Juniper PROFILE (2015) 

Part research round-up, part manifesto, this treatise on Britain’s 
‘natural capital’ is a model of pragmatism. As environmentalist Tony 
Juniper shows, UK ecosystems were valued at £1.6 trillion (US$2.4 
trillion) in 2011 by the Office for National Statistics. Yet poor practices 
such as overfishing and soil degradation are breaking nature’s bank. 
Juniper offers smart policy action points for switching to sustainability, 
and ingenious case studies — from ‘woodland system’ farming to 
reintroduced beavers that help with riverine flood control. Barbara Kiser 
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Greek politics stall 
research reforms 


The ongoing damage to Greek 
scientific research is not solely 
due to austerity measures 
(Nature 517, 127-128; 2015). 
In my experience as a member 
of Greece's National Council 
for Research and Technology 
from 2010 to 2014, political 
manipulation and institutional 
weakness are also contributors. 
To address the dire problem of 
underfunded research, in 2011 
the council introduced an open, 
competitive grant scheme (called 
Aristeia, or excellence) based on 
the European Research Council 
model. It ran for two rounds, 
during which we had to battle 
against other governmental forces 
to maintain its European Union 
(EU) funding. The scheme now 
seems to have been abandoned. 
The council developed a multi- 
annual plan that year for research 
and development (R&D) to bring 
Greece closer to EU expenditure 
targets by 2020. This was stalled 
and diverted by government. We 
pressed for the creation ofa 
high-level government committee 
to oversee R&D, and for a 
research agency similar to the 
US National Science Foundation. 
That plan was also lost, diluted by 
the research law you mention. 
The council's experience 
reflects the wider problems of 
Greece's government: how it seeks 
and receives expert advice, the 
public status of this process and 
the near-impossibility of rational, 
stable long-term planning. The 
shallow and short-term strictures 
of the ‘troika’ — the three 
organizations that act for Greece's 
creditors — make matters worse. 
Kevin Featherstone London 
School of Economics ¢& Political 
Science, UK. 
k.featherstone@lse.ac.uk 


Leave Brazil’s Red 
List alone 


Brazil’s government has agreed 
to review its updated Red List of 
the country’s threatened marine 


species. This review represents 
a victory for lobbyists in the 

fishing industry. It is not based 
on new biological information. 

The list that was issued in 
December 2014 by Brazil's 
environment ministry (through 
decrees 444 and 445) was the 
culmination ofa six-year process 
involving 1,300 national and 
international scientists, overseen 
by the International Union 
for Conservation of Nature. It 
restricts or bans the capture of 
several commercially valuable 
fish, such as groupers and 
sharks. 

The fishermen’s unions last 
month questioned the criteria 
for inclusion and persuaded 
Helder Barbalho, minister of 
fisheries and aquaculture, and 
environment minister Izabella 
Teixeira to review the list. 

A repeal of decree 445 or an 
amendment of the Annex I list, 
which regulates the capture of 
409 fish species and 66 aquatic 
invertebrates, would be a serious 
setback for conservation and for 
the sustainable management of 
fisheries in Brazil. 

Alexander C. Lees Museu 
Paraense Emilio Goeldi, Belém, 
Para, Brazil. 
alexanderlees@btopenworld.com 


EU research plan 
may widen gaps 


The success of the European 
Union (EU) Research and 
Innovation programme depends 
on achieving critical mass 
among member states and 
optimizing each state's research 
contribution (see also M. Zylicz 
Nature 517, 438; 2015). This 
could be difficult, given the 
wide variation in each state's 
willingness to participate and in 
their investment in research. 
Research excellence and 
competitiveness remain 
concentrated in just a few 
geographical areas, despite 
efforts by the EU to promote 
homogeneity. It is those regions 
that make the advances in 
research and technology, fuelling 


the imbalance (see K. Schwab 
(ed.) The Global Competitiveness 
Report 2013-2014 World 
Economic Forum, 2013). 

The EU plan to align 
national research programmes 
could make matters worse. 
Closer cooperation between 
researchers and between states 
will help to secure research 
sponsorship and collaboration 
with scientists outside Europe. 
But these advantages are more 
likely to be enjoyed by high- 
performing countries, further 
widening the gap from the 
others. The proposed alignment 
will also have to struggle with 
extra bureaucracy and delays 
(M. Cuijpers et al. Res. Policy 40, 
565-575; 2011). 
Pier Francesco Moretti National 
Research Council of Italy, 
Brussels, Belgium. 
pierfrancesco.moretti@cnr.it 


Biodiversity: include 
freshwater species 


The omission of freshwater 
species from your biodiversity 
assessment (Nature 516, 
158-161; 2014) reflects a 
more general bias towards 
terrestrial conservation, borne 
of insufficient knowledge about 
freshwater ecosystems. The Red 
List of the International Union 
for Conservation of Nature, 
for example, is dominated by 
freshwater fish species whose 
population status is unknown. 
Integrative conservation 
measures are particularly 
important in places where 
people depend on freshwater 
resources for subsistence, 
and where human activities 
are rapidly changing rivers, 
lakes and their surrounding 
landscapes. The highly 
biodiverse Amazon River 
basin is an example. In parts of 
Africa, diminishing supplies of 
freshwater fish have led to the 
overexploitation of terrestrial 
animals (J. S. Brashares et al. 
Science 306, 1180-1183; 2004) 
We need more data for 
freshwater ecosystems to inform 


conservation strategies and to 
integrate them with terrestrial 
habitats. 

Sebastian Heilpern University of 
Chicago, Illinois, USA. 
sheilpern@uchicago.edu 


Biodiversity: sharks 
and rays in peril too 


Your status report on fauna 
biodiversity (Nature 516, 
158-161; 2014) overlooks a 
group that is causing serious 
concern among conservationists 
— sharks, rays and chimaeras. 
These are particularly vulnerable 
to fishing and by-catch, in part 
because they mature late and 
produce few young. 

An estimated 24% of this 
group, known as chondrichthyan 
fish, are threatened with 
extinction under the Red List 
criteria of the International 
Union for Conservation of 
Nature. This exceeds the 
percentage for birds and is 
comparable to that for mammals. 
There are insufficient data 
to determine status in 47% 
of chondrichthyan fish, and 
models predict that many of 
these could also be under threat, 
given their similar life history 
and morphology to the listed 
chondrichthyans. 

Extinction of ocean fish is 
hard to verify. There is as yet no 
documented global extinction 
of a chondrichthyan, but 
many populations are locally 
or regionally extinct (such as 
sawfishes (Pristidae family); see 
N.K. Dulvy et al. Aquat. Conserv. 
http://doi.org/zkc; 2014). Some 
critically endangered species, 
including the Pondicherry shark 
(Carcharhinus hemiodon) in the 
Indo- West Pacific, have not been 
recorded in decades and may 
already be extinct. 

Peter M. Kyne Charles Darwin 
University, Darwin, Australia. 
Nicholas J. Bax CSIRO, 
Australia; and University of 
Tasmania, Hobart, Australia. 
Nicholas K. Dulvy Simon Fraser 
University, Burnaby, Canada. 
peter.kyne@cdu.edu.au 
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OBITUARY 


Hubert Markl 


(1938-2015) 


Biologist who steered German research organizations through reunification. 


ubert Markl had an extraordinary 
H impact on research in Germany 

before, and crucially during, the 
turbulent process of reunification. An 
evolutionary biologist and behavioural 
scientist, he was also a writer, public intel- 
lectual and policy-maker. His stints as 
president of the German Research Foun- 
dation (1986-91), the Berlin-Brandenburg 
Academy of Sciences and Humanities 
(1993-95), and the Max Planck Society 
(1996-2002) shaped the entire German 
and European research systems. 

Markl died on 8 January, aged 76. He 
was born in Regensburg, southern Ger- 
many, in 1938. Although he had an early 
interest in the humanities, Markl stud- 
ied biology, chemistry and geography at 
Ludwig Maximilian University in Munich. 
His teachers included luminaries such as 
the behavioural scientists Martin Lindauer, 
Konrad Lorenz and Karl von Frisch and 
the zoologist Hansjochem Autrum. He got 
his doctorate in zoology aged 24. 

During the early 1960s, Markl held 
several research posts in the United States: 
at Harvard University in Cambridge, Massa- 
chusetts, Rockefeller University in New York 
and the Tropical Research Station of the New 
York Zoological Society (where colleagues 
called him Jim). He returned to Germany, 
to the Goethe University Frankfurt. In 1967, 
he submitted a thesis on the communication 
behaviour of social insects to acquire his 
lecturing qualification. 

In 1968, he became professor and direc- 
tor of the zoological institute at Darmstadt 
University of Technology. Markl recalled this 
appointment as the most crucial and suc- 
cessful of his life. It gave him the freedom to 
pursue research interests from evolutionary 
biology and behavioural ecology to sen- 
sory physiology and conservation. In 1974, 
Markl moved to the University of Konstanz, 
founded eight years before to revive the 
Humboldtian ideal of research-based teach- 
ing. He became one of the leading figures of 
‘Little Harvard on Lake Constance. 

That year, Markl was also elected senator 
of the German Research Foundation, the 
nation’s main public funding agency for basic 
research. After a six-year stretch as vice-presi- 
dent, he became its youngest ever president in 
1986. Ofhis many achievements there, three 
stand out: his implementation of long-term 
grants; the introduction ofa structured pro- 
gramme for doctoral training and research; 


and the opening up of funding opportunities 
for East German researchers well before uni- 
fication was agreed on in the autumn of 1990. 

Next, Markl became deeply involved in 
unifying Germany’s two higher-education 
and research systems that had headed in dif- 
ferent directions after the Second World War. 
In West Germany, teaching and research were 
combined in a federal system where each 
state had a lot of independence. The East 
had adopted the Soviet model of universities 
tooled mainly for teaching and specialist insti- 
tutes focused on research. In 1993, Markl’s 
task as founding president of the Berlin- 
Brandenburg Academy of Sciences and 
Humanities was to attract the best research- 
ers from the former East and West to become 
active members of a unified academy. He 
forged joint working groups through which 
the best minds came to trust each other. 

Markl faced much bigger challenges 
when he took the helm of Germany’s Max 
Planck Society in 1996. He was the first and 
so far only president recruited from outside 
the organization. The society had planned 
18 new institutes in eastern Germany. But 
owing to the government’s severe under- 
estimation of the costs of unification, the 
organization did not get the funds it needed. 
Hard decisions were required. 

It was clear to Markl that savings had to be 
made at existing institutes in the west, and 
more resources transferred to the new insti- 
tutes in the east. This controversial policy 
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quickly earned him a tough reputation, 
particularly when he decided to close 
underperforming and outmoded depart- 
ments, as well as entire Max Planck 
institutes, such as the one for history in 
Gottingen and for cell biology at Laden- 
burg, near Heidelberg. The closures were 
resisted by the affected state governments. 
With his sharp intellect and his talent for 
communication, Markl prevailed and 
rejuvenated the Max Planck Society. 

During his term, 153 new directors out 
of the society's 266 were appointed. Asa 
result of a root-and-branch evaluation of 
the society, Markl improved the institutes’ 
links with neighbouring universities, such 
as Gottingen, Munich and Heidelberg. In 
2000, he started the International Max 
Planck Research Schools programme. 
The scheme has attracted several thou- 
sand young scholars from abroad to study 
in Germany and continues to build bridges 
across institutional boundaries. Many 
Max Planck directors have become closely 
involved in training doctoral students as well 
as in the teaching and research activities at the 
respective German partner universities. 

Markl spoke truth to power on topics 
including genetic engineering, cloning and 
stem-cell research. He was also outspoken 
against xenophobia and in favour of inter- 
cultural learning and the right to medically 
assisted suicide. 

In 1997, he initiated an independent study 
of the history of the Kaiser Wilhelm Society 
(the predecessor of the Max Planck Society 
from 1911 to 1946) during the Third Reich. 
In 2001, asa result of this research, he pub- 
licly acknowledged the guilt of its members 
participating in the expulsion of Jewish 
colleagues and other Nazi atrocities, and 
apologized to survivors at a commemora- 
tion ceremony. 

“Responsibility does not rest with science 
as such,” Markl repeatedly told his students, 
“jt is always the individual scientist.” He will 
long be remembered as a visionary, a bril- 
liant intellectual and courageous leader. 
Without him, scholarship and science in 
Germany and beyond would not be what 
they are today. m 
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The oldest cosmic light 


The cosmic microwave background is a faint glow of light left over from the Big Bang. It fills the entire sky and 
records the Universe’s early history. Two independent experts outline what we know about this ancient light, 


both theoretically and observationally. 


Life history of 
the photon 


he oldest photons in the Universe are the 

cosmic background radiation (CBR). These 
photons are fossils formed in the first hours 
after the Big Bang. In the first 300,000 years 
of cosmic history, protons, ions and electrons 
formed a dense plasma. The CBR photons scat- 
tered off these electrons just like light scatters off 
a dense fog. During this early epoch, any form 
of energy injection would have produced fur- 
ther photons and distorted the energy spectrum 
of the CBR. The COBE satellite's measurement 
of this spectrum, which showed no detectable 
deviation from the thermal (black-body) form’ 
and was one of the achievements that led to the 
2006 Nobel Prize in Physics, constrains the Uni- 
verse’s early history and is one of the pillars of 
the Big Bang theory. 

During the next 100,000 years, electrons and 
protons combined to make neutral hydrogen. 
Because hydrogen is transparent to the CBR 
photons, these photons could start to propa- 
gate freely. They have travelled for about 


13.8 billion years, the approximate age of 
the Universe. By the time they reach our 
detectors, they have redshifted to microwave 
wavelengths, and so we observe them as 
the cosmic microwave background (CMB) 
radiation. The temperature and polarization 
patterns of the CMB bear the signatures of 
the photons’ last interactions with electrons; 
the polarization describes the direction of the 
electric fields that the photons carry. 

According to the most popular cosmological 
model, the early Universe underwent a period 
of exponential expansion called inflation. Dur- 
ing this inflationary expansion, the volume of 
the Universe grew by more than 180-expo- 
nential-fold, and tiny fluctuations in the light’s 
quantum field were amplified into density fluc- 
tuations on the scales of billions of light years. 
These density fluctuations generated sound 
waves that propagated through the early Uni- 
verse. The sound waves produced a distinctive 
pattern of ripples that have been seen not only 
in the measurements of temperature fluctua- 
tions in the CMB by the Wilkinson Microwave 
Anisotropy Probe (WMAP)®, the Planck* 
satellite and ground-based CMB experi- 
ments, but also in the Sloan Digital Sky Survey” 
measurements of the large-scale distribution 
of galaxies. 


A remarkably simple model, a Universe 
filled with dark matter, atoms and dark energy, 
and seeded by the inflationary fluctuations, 
beautifully fits these observations. These 
precise measurements determine the basic 
parameters of the cosmos: its age, density, 
shape and composition. The statistical prop- 
erties of the fluctuations provide an extra test 
of inflation. 

The sound waves also produce a distinctive 
pattern of fluctuations in the polarization of 
the CMB. Polarization fluctuations can be 
divided into two types: ‘E-mode’ polarization 
— patterns that are symmetric under mirror 
reflection and could be created by the vari- 
ations in density produced by inflation, and 
swirly ‘B-mode’ fluctuations. The WMAP 
and Planck satellites have seen the predicted 
E-mode patterns, another observational 
triumph for the theory of inflation. 

Despite the remarkable observational 
successes of this theory, there are several pro- 
found unsolved theoretical problems associ- 
ated with the inflationary model. Alternative 
models exist. Our next-best observational test 
of the model is to detect primordial gravita- 
tional waves — ripples in the fabric of space- 
time generated during the Universe’s early rapid 
expansion. Brian Keating’s companion article 


Figure 1 | Searching for the origins of the Universe. Several telescopes around the world, including three at the South Pole (the Keck Array, BICEP2 and the 
South Pole Telescope; all three shown left) and two in the Atacama Desert in Chile (right; the Atacama Cosmology Telescope towards the top of the picture and 
the POLARBEAR telescope in the foreground), are dedicated to observing the cosmic microwave background (CMB) — relic radiation from the Big Bang. Each 
telescope tackles CMB observations in a different way. The image on the right also shows the future site of the Simons Array, currently under construction. 
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BRIAN KEATING 


describes the distinctive B-mode signature of 
the gravitational waves. 


David Spergel is in the Department of 
Astrophysical Sciences, Princeton University, 
Princeton, New Jersey 08544, USA. 
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Massless 
messengers 


BRIAN KEATING 


he CMB is the most perfect black body 

known in nature’, much better than any 
laboratory black-body oven, whose walls 
are almost perfect photon absorbers. These 
ancient CMB photons are also gravimeters. 
Using their intensity and polarization prop- 
erties, we can measure the gravitational field 
of the last scattering surface — the fictitious 
shell formed by the Universe's first hydro- 
gen atoms. Like the walls of an oven, the last 
scattering surface is so absorptive that it is the 
end of the line-of-sight, the farthest we can 
lookback, at least using photons. However, the 
last scattering surface is also a gravitational- 
wave detector: a thin ‘film’ of matter on which 
primordial gravitational waves can be exposed, 
allowing us to peer back to earlier epochs 
when these waves themselves were produced. 
If inflation produced gravitational waves, 
then the waves will have imprinted a unique 
pattern of B-mode polarization on the CMB°*. 
As described in David Spergel’s companion 
article, if the B-mode polarization pattern 
proves to be of primordial origin, it will be 
strong evidence that inflation occurred. This 
is the goal of nearly a dozen CMB polarimeters 
that are either planned or currently plying 
Southern Hemisphere skies (Fig. 1). 

CMB polarimeters are astonishingly precise. 
Current experimental sensitivities are at the 
level of tens of nanokelvin, unimaginable just a 
decade ago when the hunt for B modes began’. 
This is due to a ‘Moore’s-law-like’ growth in the 
number of detectors. But these are no ordinary 
smartphone pixels. CMB polarimeters use 
superconducting bolometers, thermal sensors 
cooled below 300 millikelvin. Two bolometers, 
one per polarization state of the CMB, are cou- 
pled to reflecting (mirror-based) or refracting 
(lens-based) telescopes. Although details vary 
by instrument, polarimeters exploit the dif- 
ferential nature of the signal — what matters 
is the difference in microwave power between 
the CMB’s two polarization states. To detect 
this difference, experimentalists cleverly use 


| LIGHT 


A Nature special issue 
nature.com/light2015 


the twofold modulation of polarized signals for 
each single physical rotation about the optical 
axis of the polarimeter. (Try it with polarized 
sunglasses on a sunny day at sunset. Looking 
at the zenith, spin around once: the sky bright- 
ness modulates twice.) 

But experimental challenges run deeper 
than raw sensitivity. Systematic effects mas- 
querading as primordial gravitational waves 
need be only a few parts per billion of the 
ambient 300 K background to swamp obser- 
vations. Despite this extreme susceptibility, 
astrophysical foregrounds such as dust emis- 
sion from the Milky Way now seem to present 
the most formidable challenge. 

Polarization data obtained with the BICEP2 
telescope in Antarctica were initially interpreted 
as evidence for inflation’’, but this conclusion 
was questioned” and recently reinterpreted. A 
joint analysis using Planck observations at a fre- 
quency of 353 gigahertz and data from the Keck 
Array and BICEP2 at 150 GHz has shown that 
BICEP2’s original B-mode data were likely to be 
not purely primordial, nor caused by systemat- 
ics. Rather, they were potentially dominated by 
thermal emission from dust grains in the Milky 
Way aligned by Galactic magnetic fields’*. The 
game is still afoot — only experiments observ- 
ing the cleanest celestial regions, with sensitivity 
in multiple frequency bands, can unambigu- 
ously detect the inflationary signal. 

The prospect of detecting gravitational 
waves from the inflationary epoch, 10 * sec- 
onds after the Big Bang, is exhilarating. And 
yet, more-exotic physics may lurk undis- 
covered in the CMB’s polarization. As CMB 
photons traverse the cosmos, they trace the 
properties of matter (dark and luminous) 
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and the curvature of space-time itself. CMB 
photons are lensed; that is, their trajectories 
are bent by gravitational fields produced by 
dark matter. Such CMB polarization lensing 
perhaps offers the best hope of measuring the 
mass of the neutrino”, the only elementary 
particle whose mass is unknown. 

Ancient photons are illuminating funda- 
mental cosmic mysteries. CMB photons shed 
light on subjects that were once only the pur- 
view of particle colliders: elementary particle 
masses and ultra-high energy fields. The next 
100 years promise to be equally exhilarating, 
thanks to the indomitable, massless messenger 
of the cosmos — the photon. = 
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Death drags down 
the neighbourhood 


An analysis of dying cells reveals that they play an active part in modifying 
tissue shape by pulling on neighbouring cells. This induces neighbouring cells 
to contract at their apices, which results in tissue folding. SEE LETTER P.245 


CLAUDIA G. VASQUEZ & ADAM C. MARTIN 


normal and essential part of tissue 
A teernent maintenance and repair 

is apoptotic cell death’, in which cells 
shrink and fragment into membrane-bound 
structures called apoptotic bodies that are 
engulfed by phagocytic cells. Apoptosis has a 
key role in sculpting tissue morphology, remov- 
ing cells from between forming digits and 
eliminating vestigial structures. However, apo- 
ptosis is often regarded as a passive elimination 


of unwanted cells. On page 245 of this issue, 
Monier et al.” report a surprising finding — 
apoptosis can trigger contractions that fold 
tissues. Rather than being inert, apoptotic cells 
actively affect their surroundings, causing last- 
ing changes in tissue form and structure. 
Epithelial tissues line the body’s cavi- 
ties, creating boundaries between different 
extracellular environments. Accordingly, 
epithelial cells are polarized, with one side of 
the cell dubbed apical and the other basal. In 
epithelial tissues, apoptosis in one cell induces 
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a Apoptotic cell 


Apical side 


Adherens 
junction 


Basal side 


Figure 1 | Apoptosis and fold formation. Apoptotic cell death in an 
epithelial-cell sheet causes folding in the forming leg joints of fruit-fly larvae. 
a, Cells in this epithelial sheet are attached to their neighbours at structures 
called adherens junctions. The junctions remain intact even as cells undergo 
apoptosis. b, Monier et al.’ report that apoptotic cells in the epithelial sheet 


apical contractility in neighbouring cells. The 
result is a purse-string-like contraction that 
pushes the dying cell from the epithelium’. 
Furthermore, epithelial cells remain adhered 
to their neighbours at specific structures called 
adherens junctions while they undergo apo- 
ptosis, inducing neighbouring cells to stretch 
and elongate towards the dying cell’. Although 
apoptosis is also known to be required to gen- 
erate tissue-wide shape changes’, the link 
between this shape change and contractility is 
poorly understood. 

Contractile forces are generated through 
the actin and myosin proteins that make up 
a cell’s cytoskeleton. Actin filaments assemble 
into meshes and bundles that underlie cell 
membranes, whereas myosin is a motor 
protein that forms mini-filaments that both 
connect and ‘walk’ along actin filaments. 
Myosin mini-filaments can contract the actin 
network to generate cellular tension, and this 
contractility can be increased and transmitted 
between cells through adherens junctions in 
response to signalling molecules, promoting 
dramatic changes in tissue shape. 

Such cellular tension is essential for the 
generation and maintenance of specific tissue 
architectures. One well-characterized exam- 
ple of how cell contractility and the result- 
ing change in cell shape can influence tissue 
architecture occurs in the fruit fly Drosophila 
melanogaster. During early embryonic devel- 
opment, a signalling pathway activates the 
actin—myosin cytoskeleton, constricting the 
apices of a specific set of epithelial cells. This 
folds the epithelial sheet and results in the 
movement of muscle-cell precursors inside 
the embryo*”. By contrast, the cytoskeletons of 
cells in a mature, static epithelium constantly 
tug on neighbouring cells. This state of tension 


172 | NATURE | VOL 518 | 12 FEBRUARY 


Epithelial cell 


is required to maintain an ordered hexagonal 
cell array in epithelial tissues*. Thus, a funda- 
mental property of epithelial tissues is that cells 
continually exert pulling forces on each other. 

The leg joints of fruit flies form in the third 
larval stage of the insects’ development, when 
folds in epithelial tissue cause successive rounds 
of tissue subdivision. This process involves 
apoptosis in cells at the centre of the fold’ and 
contraction of the apical side of the surround- 
ing tissue, which then folds inward towards 
the basal side. Monier and colleagues set out to 
determine the link between apoptosis and fold 
formation in these leg joints. Using live imag- 
ing, they studied a fluorescent version of myosin 
in cells undergoing apoptosis, and report that 
the protein accumulates along the apical—basal 
axis of the apoptotic cell. They also observed 
that the apoptotic cell pulls on its neighbours 
as it contracts and shrinks into the epithelium. 
Consequently, apical myosin levels in neigh- 
bouring cells increase, which causes these cells 
to constrict their apices and form a fold (Fig. 1). 

How does apoptosis cause neighbouring 
cells to constrict? The authors show that inhib- 
iting myosin function in the apoptotic cell 
suppresses the accumulation of myosin at the 
apical edges of surrounding cells, and also sup- 
presses fold formation’. This suggests that the 
mechanical pulling force from the dying cell 
could trigger myosin activity in its neighbours. 

Studies”’° have demonstrated that apply- 
ing mechanical forces to tissues can elevate 
myosin activity. But the mechanism through 
which a pulling force exerted by an apoptotic 
cell could recruit apical myosin in the sur- 
rounding tissue remains unknown. One pos- 
sibility is that the myosin motor itself responds 
to tension, with tension increasing the length 
of time that myosin remains bound to actin 
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shrink and accumulate myosin protein along their apical-basal axis 

as they do so. Neighbouring cells also accumulate myosin at their apical 
surfaces, causing elevated tension around the apoptotic cell. c, As the 
dying cell fragments, neighbouring cells apically constrict and form a fold 
in the tissue. 


filaments’. Alternatively, tension might affect 
signalling factors that regulate activity of the 
enzyme Rho kinase (as has been proposed 
for folding embryonic epithelia’), because 
myosin dynamics and spatial organization 
are regulated by a balance between myosin 
phosphorylation by Rho kinase and dephos- 
phorylation by phosphatase enzymes”. It is 
also possible that contractile activity in the 
apoptotic cell compromises the integrity of 
the cells membrane, leading to the release of 
a chemical signal, such as ATP, that induces 
contraction in neighbours’’. One experiment 
that would directly test whether physical trans- 
mission of tensile force is required to elevate 
myosin would be to disrupt adherens-junction 
proteins, which are required to mechanically 
couple epithelial cells during apoptosis’. 
How common is the phenomenon of 
apoptosis triggering or contributing to 
morphogenetic movements? Apoptotic cells 
contribute to generating the tension that 
drives epithelial-sheet movement in a fruit-fly 
tissue called the amnioserosa™. Monier and 
co-workers’ study demonstrates that apoptosis 
—and possibly the resulting pulling force — 
induces myosin accumulation and fold for- 
mation in three other epithelial tissues. Thus, 
apoptotic cells may have a more active role in 
shaping tissues than was previously believed. 
Tissue folding is most often thought to be 
triggered by transcription factors or secreted 
signalling molecules, or both. This study dem- 
onstrates that in some tissues there is in fact 
a relay effect, in which induction of one type 
of cell behaviour can trigger changes in the 
surrounding tissue. The idea that mechani- 
cal signalling can trigger a propagation of 
contractile activity was originally proposed in 
some of the first mechanical models of tissue 


B. SAXTON (NRAO/AUI/NSF) 


folding more than 30 years ago”. It is surprising 
that this trigger could represent a dying cell’s 
final tug. m 
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Sibling rivalry 
begins at birth 


High-resolution astronomical observations of a nearby molecular gas cloud have 
revealed a quadruplet of stars in the act of formation. The system is arguably the 
youngest multiple star system detected so far. SEE LETTER P.213 


KAITLIN M. KRATTER 


oughly 150 years after the invention of 
R the telescope, astronomers deciphered 
the mysteries of double stars. Using 
probability arguments, natural philosopher 
John Michell suggested in 1767 that pairs of 
stars in the sky were in gravitational tangos, 
not chance alignments. The early discovery 
of such binary stars reflects their numbers: 
at least half of stars like the Sun are found in 
multiple systems’”. And yet the origins of this 
all-too-normal population are mysterious. On 
page 213 of this issue, Pineda et al.’ report the 
discovery of a quadruple star system still in the 
‘womb (Fig. 1). With these data, we are closing 
in on understanding binary-star conception. 
Why worry about binary stars versus single 
stars at all? It turns out that binary stars have 
an outsize impact on many astrophysical phe- 
nomena. First, binaries allow us to detect stel- 
lar-mass black holes, which are the remnants of 
massive stars and can be seen only because of 
their gravitational influence ona normal stellar 
companion. Second, they generate the type la 
supernovae explosions that are used as ‘stand- 
ard candles’ to measure cosmic distances. 
Third, we now know that planets can form in 
binaries, notably in the habitable ‘Goldilocks’ 
temperature zone, where the planet is at the 
right distance from the host star to retain at 
least some liquid water. Finally, binaries may 
also be a source of gravitational waves. For 
all of these research areas, characterizing 
the binary population is crucial, and proper 
accounting must start at birth. 
Young stars have more bound companions 
than their older counterparts**. This observa- 
tion indicates that binary stars form together 


at birth, rather than uniting after they grow 
up. Pineda and colleagues’ infant quadruple 
system reinforces this idea. Multiple and single 
stars alike form in cold, filamentary molecular 
gas clouds. Within filaments, we find proto- 
stars forming in dense regions because of the 
triumph of gravitational forces over thermal, 
turbulent and magnetic pressures. The relative 
importance of turbulence and magnetic fields 
in star formation is an open question®. Theor- 
etical models that require turbulence to set 
the properties of star-forming regions predict 
that sometimes there is not just one dominant 
dense region in a filament, but two or three, or 
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even four® ’. Frequently, these regions turn into 
a bound protostellar system. As they continue 
to accrete gas and dust from their surround- 
ings on their way to becoming fully fledged 
stars, sometimes sibling rivalry plays a part 
in deciding which star gets to be the biggest””. 
The siblings either work out their differences 
and settle into organized periodic orbits, or, if 
necessary, eject one party to keep the peace. 
Pineda and colleagues have observed argu- 
ably the youngest multiple system so far using 
the Karl G. Janksy Very Large Array in New 
Mexico and the James Clerk Maxwell Telescope 
in Hawaii. They discovered four distinct dense 
gas condensations in a nearby molecular cloud, 
only one of which hosts a protostar. All four 
objects are incredibly young in astronomical 
terms, probably less than 10° years old (based 
on their densities and temperatures). 
Detecting systems at birth is extremely 
difficult on multiple counts. First, the star- 
formation timescale is only a few hundred 
thousand years, which is short compared 
to stellar lifetimes of billions of years. Thus, 
the probability of ‘catching’ star systems in 
the act of formation is small. Second, these 
siblings can be seen through their molecular 
womb only with observations that have high 


Figure 1 | A young quadruplet. Pineda et al.’ have discovered four distinct gas condensations in a 
clumpy, filamentary gas cloud (white) surrounded by dust (blue). The locations of the condensations 
in this image are marked with black and red dots. The four condensations are destined to form a bound 
multiple star system, and one of them (red dot) has already ‘turned on as a protostar. 
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sensitivity and high spatial resolution to sepa- 
rate one body from the next. Given the relative 
rarity of quadruple star systems at older ages, 
one might think this discovery improbable, or 
lucky. On the contrary, it supports predictions 
that most stars begin their lives in a litter”. 

The projected separations between these 
bodies are of the order of 1,000-10,000 times 
the Earth—Sun distance, consistent with theo- 
retical models". Significantly, these separa- 
tions are smaller than the Jeans length in the 
cloud. This is the length scale on which the 
cloud would be expected to fragment on the 
basis of the competition between gravity and 
thermal pressure. The scale of fragmentation 
is a strong indication that turbulent motions in 
the cloud played a part, which is a significant 
boost to the turbulent-fragmentation model 
of star formation described above. Of course, 
the discovery of one system does not preclude 
other modes of binary formation. 

In addition to its youth, this system is 
remarkable because it is convincingly gravi- 
tationally bound. In general, it is challeng- 
ing to measure each of the quantities that go 
into the calculation for assessing gravitational 


DNA REPAIR 


binding: the total mass of the system, the 
velocities of its members and their separa- 
tions. The uncertainty in mass is introduced 
by unknown quantities such as the ratio 
of dust grains to hydrogen gas, or the ratio of the 
observed molecule, ammonia, to the total 
gas mass. Velocities and distances are both 
difficult to determine because we see only two- 
dimensional projections of three-dimensional 
quantities. Even folding in these uncertainties, 
Pineda and co-workers demonstrated that, as 
with Michell’s double stars, these four are not 
just chance alignments. 

Many questions remain about this system 
and others like it. We cannot predict whether 
the quadruplets will remain together or break 
up to form a triplet and a singlet. We do not 
know what the final orbital configuration will 
be, nor the ratio of the masses of the compo- 
nents. The ability to make these predictions, at 
least statistically, is crucial for connecting these 
infants with the descendants that conspire to 
explode as supernovae. And perhaps the big- 
gest question for the field remains unanswered. 
What determines which star gets a sibling and 
which does not? The influx of data from the 


Familiar ends with 
alternative endings 


The faithful propagation of species requires a complex balance of DNA-repair 
pathways to maintain genome integrity. New work sheds light on one such poorly 
understood pathway and its role in certain cancers. SEE LETTERS P.254 & P.258 


NAM WOO CHO & ROGER A. GREENBERG 


he repair of double-strand breaks in 
DNA is crucial for life, and can proceed 
by several mechanisms. How are these 
repair pathways coordinated, and what hap- 
pens when certain pathways are disabled? This 
question is especially pertinent because DNA- 
repair systems are deficient in many types of 
cancer; inhibition of complementary repair 
pathways'” might therefore be used to kill such 
cancer cells. Two papers” in this issue, and an 
accompanying report” published online in 
Nature Structural and Molecular Biology, reveal 
how a poorly understood alternative repair 
pathway compensates when the predominant 
mechanisms are impaired. The findings help to 
explain the causes of genomic rearrangements 
that are seen in cancer cells, and suggest thera- 
peutic opportunities for cancers that harbour 
DNA-repair deficiencies. 
Double-strand-break repair mostly occurs 
through mechanisms called homologous 
recombination (HR) and non-homologous 


end-joining (NHE]; Fig. 1). HR uses DNA ends 
that have been ‘resected’ to produce 3’ single- 
stranded DNA overhangs, which probe for 
complementary strands — homologous DNA 
sequences — that act as templates for DNA 
repair. NHEJ does not require homologous 
DNA, and preferentially acts on ‘blunt-ended’ 
DNA breaks that do not have large overhangs. 

But an alternative form of end-joining has 
been described’ * that lies somewhere between 
HR and NHEJ. Most alternative end-joining 
seems to rely on microhomologies — short 
sequences of a few homologous base pairs 
— and thus is often referred to as micro- 
homology-mediated end-joining (MME)J). 
Unlike HR, MME) is inherently error-prone 
because the use of microhomology leads 
to repair events that produce deletions of 
sequences from the strand being repaired and 
rearrangement of sequences between pairs of 
chromosomes, known as translocations. 

So how does MMEJ fit into the larger 
picture of double-strand-break repair? Mateos- 
Gomez et al.’ (page 254) have addressed this 
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current and next generation of telescopes such 
as the Atacama Large Millimeter/submilli- 
meter Array and the James Webb Space 
Telescope will surely help to solve this puzzle. m 
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question by exploring the nature of DNA 
sequences at the junction of two fused telo- 
meres (repetitive DNA sequences that cap the 
ends of chromosomes). Such joining between 
the ends of different chromosomes is undesir- 
able, and is normally suppressed. However, 
telomeres can fuse in the absence of protective 
mechanisms, including the absence of NHE]’. 
When the authors sequenced mouse telomere 
fusion junctions in an NHE)J-deficient system, 
they observed seemingly random, permuted 
sequences attributable to MME]. They found 
that an enzyme called Pol@ (encoded by the 
Polq gene in mice) promotes this process in 
mammalian cells. 

Pol@ is an atypical member of the DNA 
polymerase family of enzymes, which catalyse 
the synthesis of chains of nucleic acids. It con- 
tains both a helicase~-ATPase domain, which 
separates the strands of a DNA helix using 
energy from the hydrolysis of ATP molecules, 
and an error-prone polymerase domain that 
can extend DNA strands from mismatched 
or unmatched termini. Previous work" had 
demonstrated a role for Pol0 in MMEJ in the 
fruit fly Drosophila melanogaster. Mateos- 
Gomez and colleagues’ finding in mice there- 
fore indicates that this role is evolutionarily 
conserved. 

Intriguingly, the researchers show that 
MME) and HR are competing pathways, as 
evidenced by their observation of increased 
Rad51-dependent HR in cells lacking 
Pol; Rad51 is a protein that performs HR- 
mediated repair. Furthermore, when the 
authors reduced Pol levels in HR-deficient 
mouse cells harbouring mutations in either 
of the breast-cancer susceptibility genes 
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Figure 1 | Repair mechanisms for double-strand breaks in DNA. a, Blunt-ended double-strand 

breaks (DSBs) can be repaired by non-homologous end-joining (NHE)). b, Resection of blunt ends 
generates 3’ overhangs, allowing additional repair mechanisms. c, In homologous recombination (HR), 
further resection enables binding of Rad51 proteins, which help the overhang to bind to complementary 
(homologous) DNA in another DNA helix, forming a structure called a D-loop. The homologous DNA 

is used as a template for repair. d, A third mechanism, microhomology-mediated end-joining (MMEJ), is 
promoted by the enzyme Pol. Kent et al.° report that Pol8 molecules bind to pairs of resected ends, enabling 
short homologous DNA sequences in overhangs to form base pairs, leaving adjacent regions unpaired. The 
enzyme then extends each strand (red arrows) from the base-paired region using the opposing overhang as 
a template, and the unpaired regions are removed. Mateos-Gomez et al.’ and Ceccaldi et al.’ find that Pol8 
also impedes HR by limiting Rad51 accumulation at resected ends. Both groups also show that HR-deficient 
cancers might rely on Pol§-mediated Rad51 inhibition and MME for survival. 


brcal or brea2 (or their human equivalents 
BRCA1 and BRCA2), this greatly exacerbated 
chromosomal aberrancies and reduced the 
ability of the cells to survive, suggesting that 
Pol6-mediated MMEJ could compensate 
for the loss of HR. These findings provide 
enticing mechanistic insights into the obser- 
vation” that cancer genomes involving BRCA 
mutants display characteristic signatures 
of MME]. 

Ceccaldi and colleagues’ study’ (page 258) 
began with the finding that Pol@ is over- 
expressed in high-grade serous ovarian 
cancer, which is characterized by high rates 
of HR deficiency and frequent BRCA muta- 
tions. Like Mateos-Gomez et al., they observed 
increased HR and RAD51 accumulation at 
DNA breaks in human cells depleted of Pol®, 
compared with normal cells. The authors also 
found that Pol interacts directly with RAD51, 
and they identified several binding motifs for 
RADS1 on Pol8, one of which was necessary 
and sufficient for this interaction. 

The researchers went on to show that both 
the ATPase- and RAD51-binding domains 
contribute to the inhibition of RAD51- 
dependent D-loop formation (a key step in 
the HR pathway), and to the inhibition of HR. 
Interestingly, they observed that Pol6-depleted 
cells have a mild hypersensitivity to DNA- 
damaging agents, and that this becomes much 
more striking when Pol depletion is combined 
with HR deficiency caused by either depletion 
of FANCD2 (a protein involved in HR) or 


BRCA1 mutation. Ina clear example of genetic 
interaction, the investigators observed that 
mice died as embryos when both Fancd2 and 
Polq genes were mutated, whereas mutation of 
either gene alone allowed the animals to survive 
to adulthood. 

The two studies suggest that inhibition of 
Pol€ might be therapeutically useful for HR- 
deficient cancers. But what aspect of Pol® 
should be inhibited? Ceccaldi et al. found that 
the RAD51-binding motifs are required to limit 
RAD51 accumulation at breaks and toxicity in 
HR-deficient cells, but that the polymerase 
domain is not essential. This suggests that the 
motifs are the appropriate target on Pol€. How- 
ever, Mateos-Gomez et al. report high rates of 
HR in mouse embryonic stem cells expressing 
Pol® and in which the polymerase domain is 
inactivated by mutation, and find that these 
cells show reduced survival when Brca1 expres- 
sion is low. These observations suggest that the 
Rad51-binding motifs and the polymerase 
domain of Pol@ may each contribute to survival 
of HR-deficient cells, although quantification 
of their relative contributions remains to be 
sorted out. Introducing defined mutations in 
each domain in an in vivo model could help to 
address this important issue. 

Other fundamental questions are, how does 
Pol perform MMEy, and whatare its substrates? 
Kent et al.” show that human Polé directly binds 
two ends of resected double-strand breaks in a 
process that depends on an evolutionarily con- 
served loop domain in the enzyme (Fig. 1d). 
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50 Years Ago 


The Medical Research Council 

has set up a Brain Metabolism 
Research Unit in the Department 
of Pharmacology, University of 
Edinburgh Medical School. The 
Unit, which will be under the 
honorary direction of Prof. W. 

L. M. Perry, aims to undertake 
experimental and clinical studies of 
the metabolic pathways of certain 
amino-acids and other substances 
in the brain and tissue fluids. The 
action of psychotropic drugs on 
these pathways will be used as 

a means of trying to determine 
whether there are metabolic defects 
in the various psychoses and 
ultimately whether such defects can 
be corrected. 

From Nature 13 February 1965 


100 Years Ago 


In answer to a question as to typhoid 
in the Army, asked in the House 

of Commons on February 8, 

Mr. Tennant, Under-Secretary of 
State for War, said:— “Of the 421 cases 
of typhoid in the present campaign 
among British troops 305 cases were 
in men who were not inoculated 
within two years. In the 421 cases 
there have been thirty-five deaths. 
Of these deaths thirty-four were 
men who had not been inoculated 
within two years. Only one death 
occurred among patients who were 
inoculated, and that man had only 
been inoculated once, instead of the 
proper number of times — namely, 
twice.” This is a marvellous record; 
and no further answer than it 
provides is needed to the inhuman 
efforts made by anti-vaccinationists 
to induce men to object to 
inoculation ... we can only wonder 
at the patience of the British 

people in permitting a prejudiced 
faction to urge men not to subject 
themselves to a treatment by which 
they save others and themselves 
from suffering and death. 

From Nature 11 February 1915 
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This allows microhomologous sequences in the 
overhangs to form base pairs. Pol® then extends 
each strand from the base-paired region using 
the opposing overhang as a template. The 
researchers find that Pol® requires partially 
resected DNA containing two to six base pairs of 
microhomology to perform MME) in vitro. This 
explains why MMEJ can act on substrates des- 
tined for HR, because both repair mechanisms 
require resection. 

The three papers provide a beautiful illustra- 
tion of how the multifunctional DNA-repair 
toolkit is coordinated to faithfully preserve 
genome integrity. They also reveal how cancer 
cells may thrive through increased usage of 


GEOCHEMISTRY 


error-prone repair pathways — a strategy 
that enables cellular survival by limiting the 
accumulation of persistent DNA damage, but 
at the expense of genome integrity. Under- 
standing the mechanisms involved opens up 
targets for anticancer drug research. m 
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When carbon escaped 


from the sea 


Arecord of boron isotopes in fossils of microscopic plankton provides fresh 
evidence that some ocean regions were a source of carbon dioxide to the 
atmosphere when Earth warmed at the end of the last ice age. SEE LETTER P.219 


KATHERINE A. ALLEN 


uring the last ice age, northern 
D continents groaned under the weight 

of vast ice sheets and the concentra- 
tion of carbon dioxide in the atmosphere was 
about 50% lower than it is today’. Directly after 
the ice age, during a period known as the gla- 
cial termination, CO, probably escaped from 
the ocean into the atmosphere, but direct evi- 
dence for this carbon transfer has been hard to 
find. On page 219 of this issue, Martinez-Boti 
et al.” identify areas of CO, degassing from the 
ocean during the termination. Their records 
provide clues to the role of the ocean in the 
dramatic transition from an icy planet to the 
milder climate of the modern world. 

Today, the exchange of CO, between 
ocean and atmosphere varies by season and 
location. In an average year, tropical oceans 
(14°S to 14° N) are a strong source of CO, to 
the atmosphere, whereas the mid-latitudes 
(14-50°S and N) and North Atlantic con- 
tain strong sinks, and the Southern Ocean 
(south of 50°S) swings seasonally between 
source and sink, resulting in a net neutral 
effect for that region’. These patterns of gas 
exchange depend primarily on the difference 
between the partial pressure of CO, (pco_) in 
the ocean and that of the atmosphere (partial 
pressure is a measure of the pressure gener- 
ated by a component of a mixture of gases). 
This difference (called Apgo,) may have varied 
in the past. In general, if the partial pressure 
of CO, in seawater is greater than that in air 


(that is, if Apco is positive), then seawater will 
lose CO, to the air, and vice versa. 

Ice-core records indicate that, during the 
Last Glacial Maximum (about 23,000 to 19,000 
years ago), the atmospheric CO, concentration 
was roughly 30% lower than during the pre- 
ceding warm period‘. Somehow, during the 
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descent into the last ice age, the CO, balance 
between ocean and atmosphere was tipped 
in favour of the ocean. After the ice age, CO, 
levels rose again, implying that the balance 
must have tipped back. These shifts in carbon 
storage could have been caused by physical, 
chemical or biological mechanisms — such 
as changes in ocean circulation, sea-ice cover 
and phytoplankton productivity — potentially 
working in concert. 

To gain insight into these possibilities, 
palaeoceanographers turn to sea-floor 
sediments. Marine sediments represent a rich 
archive of ocean history, in that the remains of 
biological matter and their chemical composi- 
tion reflect past biological activity and ocean 
environmental conditions. Despite much 
progress, it has been difficult to tease out the 
mechanism underlying past changes in ocean 
CO, storage. A particular challenge has been 
the lack of direct evidence for exchange of CO, 
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Figure 1 | Exchange of carbon dioxide between the ocean and atmosphere. a, Martinez-Boti et al.’ 
have used boron isotopes from planktonic fossils to determine past differences between the partial 
pressures of CO, in the ocean and the atmosphere (Ap,o,, measured in microatmospheres) in the eastern 
equatorial Pacific Ocean (EEP) and the South Atlantic Ocean (SA). b, They find that Apco, in the EEP was 
negative, on average, during the latter part of the last ice age (25,000 to 19,000 years ago), implying that 
the ocean absorbed CO, from the atmosphere; ice-age data for the SA are not available. During the glacial 
termination (about 18,000 to 11,000 years ago), average Apco_ at both sites was positive. This suggests that 
the ocean released CO, to the atmosphere at both locations at that time, which probably contributed to 
the overall rise in atmospheric CO, levels between the ice age and the following, milder Holocene epoch. 
Data for the modern ocean (solid lines) are from ref. 3. Squares represent mean values; vertical bars 


represent the full range of values for each period. 
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across the air-sea interface. Martinez-Boti 
et al. offer an insight into this exchange that 
is based on the boron isotopic composition 
(8"'B) of calcite — a form of calcium carbon- 
ate — produced by planktonic foraminifera. 

Planktonic foraminifera are single-celled 
organisms that spend their lives floating in 
water. Many species inhabit the upper sun- 
lit layers of the ocean, just below the air-sea 
interface, where CO, exchange occurs. Mod- 
ern calibrations have shown that, as the pH of 
seawater increases, the 6"'B of calcite produced 
by planktonic foraminifera also increases”®. 

Seawater pH and fp, are also directly linked. 
This fact has enabled the atmospheric CO, 
record to be calculated’ from the 6"'B of cal- 
cite in sedimentary cores taken from beneath 
a mid-ocean gyre (a large rotating system of 
wind-driven ocean currents), where the ocean 
is roughly in equilibrium with the atmosphere 
(Apco, is approximately 0). The close agree- 
ment between this 6''B-based CO, record and 
the record measured from bubbles of ancient 
air trapped in Antarctic ice demonstrated that 
6"'B from the mid-ocean gyre is closely linked 
to atmospheric CO, across ice-age cycles. 

By contrast with that study, Martinez-Boti 
et al. have targeted sedimentary cores from 
ocean upwelling regions to identify specific 
locations and periods of ocean CO, degassing 
(positive Apco,). This is not the first time that 
this has been done — the 6"’B record from the 
western equatorial Pacific Ocean® suggested 
that the tropics may have been a significant 
source of CO, across the last glacial termina- 
tion. The present study builds on that work by 
targeting regions in which this technique has 
not yet been applied: the eastern equatorial 
Pacific Ocean and the edge of the Southern 
Ocean in the South Atlantic Ocean. 

The authors derived Apco, records by com- 
bining estimates of the partial pressure of CO, 
in the ocean with CO, data from ice cores. 
The records suggest that both the eastern 
equatorial Pacific and the South Atlantic were 
major sources of atmospheric CO, during the 
deglaciation (about 18,000 to 11,000 years ago; 
Fig. 1). Some of the highest Apgo, values are 
accompanied by regional peaks in opal flux 
(the silica produced by marine microorgan- 
isms; see Fig. 3 of the paper). This is consistent 
with the idea that intensified ocean upwelling 
brought both nutrients and respired CO, to the 
sea surface during the deglaciation’. 

Even more intriguingly, the record for the 
eastern equatorial Pacific does not follow the 
same pattern as that for the South Atlantic, 
implying that gas exchange in the two regions 
might be controlled by different processes. 
Together, these records imply that ventilation 
of CO, from the Southern Ocean increased the 
carbon content of the atmosphere during the 
last deglaciation, and that gas exchange in the 
tropics may also have had a pivotal role. 

These exciting hints regarding ocean- 
atmosphere carbon exchange raise further 


questions. Martinez-Boti and colleagues 
report that Apco, in the eastern equatorial 
Pacific fluctuated during the Last Glacial 
Maximum when atmospheric CO, levels were 
relatively stable, implying that other ocean 
regions may have acted as sinks. Records from 
other ocean regions are needed to confirm this. 
It should also be noted that, although Apco, 
records from certain key sites are invaluable 
for testing ice-age hypotheses, the data are not 
sufficient to quantify net global air-sea CO, 
exchange across ice-age cycles. For perspec- 
tive, the uncertainty associated with estimates 
of modern CO, flux is +50%, even though 
there have been more than 3 million direct 
CO, measurements’. Still, the new results are 
promising, and their usefulness will increase 
as more records are established. 

Additional records are now needed to cover 
the other half of the glacial cycle: the descent 
into the ice age. A consideration of Apgo, 
obtained from the remains of planktonic 
foraminifera, along with proxies of deep- 
ocean carbonate chemistry (such as 5"'B or 
the ratio of boron to calcium in the fossils of 
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sediment-dwelling benthic foraminifera) may 
help palaeoceanographers to link sites of CO, 
storage in the interior ocean with exchange 
sites at the sea surface, thereby providing a 
deeper understanding of the link between 
carbon and climate. = 


Katherine A. Allen is in the Department of 
Marine and Coastal Sciences, 

Rutgers University, New Brunswick, 

New Jersey 08901, USA. 

e-mail: kat.allen@rutgers.edu 


1. Monnin, E. et al. Science 291, 112-114 (2001). 

2. Martinez-Boti, M. A. et a/. Nature 518, 219-222 
(2015). 

3. Takahashi, T. et al. Deep Sea Res. || 56, 554-577 
(2009). 

4. Petit, J.R. et al. Nature 399, 429-436 (1999). 

5. Sanyal, A., Bijma, J., Spero, H. J. & Lea, D. W. 
Paleoceanography 16, 515-519 (2001). 

6. Henehan, M. J. et a/. Earth Planet. Sci. Lett. 364, 
111-122 (2013). 

7. Honisch, B. & Hemming, N. G. Earth Planet. Sci. Lett. 
236, 305-314 (2005). 

8. Palmer, M. R. & Pearson, P. N. Science 300, 
480-482 (2003). 

9. Anderson, R. F. et al. Science 323, 1443-1448 
(2009). 


Cold shock protects 


the brain 


A protein released during hypothermia has been found to affect the progression 
of neurodegenerative disease in mice by sparing neurons from death and 
preserving the connections between them. SEE LETTER P.236 


GRAHAM KNOTT 


istory is littered with examples of the 
H use of cold as a therapeutic agent — 
from the ancient Greek physician 
Claudius Galen, who used it to treat fevers, 
to its application for improving the outcomes 
of surgery after serious battlefield injuries. 
Mounting evidence ofits effectiveness led sur- 
geons in the 1950s to use hypothermia as a tool 
to ameliorate the side effects of brain surgery’, 
and even today, cold is an effective treatment 
for birth asphyxia — a lack of oxygen during 
the perinatal period that can lead to brain dam- 
age. It was not until 1987 that an animal study” 
showed that cooling could reduce neuronal 
death after brain injury. In this issue, Peretti 
et al.’ (page 236) provide compelling evidence 
that a specific protein affects the brain's abil- 
ity to reshape its connections in response to 
cooling, and that the protein’s overexpression 
might provide therapeutic benefits in neuro- 
degenerative disease. 
The neuroprotection offered by cold has 
spurred efforts to understand the mechanisms 


responsible, and findings so far offer insight 
into how brain metabolism and cell death 
are affected*. One aspect of neuroprotection 
research that is attracting interest involves 
a small group of proteins whose synthesis 
increases during hypothermia, even as pro- 
duction of other proteins decreases. These 
cold-shock proteins, which are found in many 
species, bind with RNA and mediate overall 
protein production. One such protein in par- 
ticular, RNA-binding motif protein 3 (RBM3), 
is emerging as a central player in the protec- 
tion of neurons after periods of hypothermia’. 
Investigations have focused mainly on how 
RBM3 and other cold-shock proteins affect 
the outcome of various types of stroke. 

In their study, Peretti and colleagues sub- 
jected mice modelling Alzheimer’s disease 
and mice infected with prions (proteins that 
cause neurodegenerative diseases such as 
Creutzfeldt-Jakob disease) to deep hypo- 
thermia, with the animals’ body tempera- 
tures being reduced from 37°C to 16-18°C 
for 45 minutes. The authors identified and 
counted the synaptic connections between 
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providing clues to what happens in the wild. 


neurons using electron microscopy, and found 
that, during hypothermia, the number of con- 
tacts dropped significantly in a region of the 
brain implicated in memory formation — the 
hippocampus. After young animals had been 
warmed up, the number of synapses returned 
to normal. 

In older animals, the same loss of synapses 
occurred, but the recovery did not. The effects 
on RBM3 also differed in old and young ani- 
mals: whereas levels of RBM3 rose in response 
to the temperature drop in young mice, they 
failed to do so in older ones. Although the 
researchers found no symptoms of neuro- 
degeneration in older animals on the basis of 
histological sections, biochemical measure- 
ments or behaviour, these mice were presum- 
ably further along the road to disease. 

Neurodegenerative disease is known to 
involve the progressive loss of synapses and 
neurons, with a resulting reduction in cogni- 
tive ability. Peretti and co-workers’ experiment 
is interesting because of the lack of response 
of RBM3 to cooling in pre-symptomatic older 
animals. Another aspect of note is that this is 
the first demonstration of synaptic loss fol- 
lowed by recovery using a laboratory-based 
model of hibernation. Previous work® has 
shown some synaptic remodelling in hibernat- 
ing animalsthat is suggestive of a weakening of 
connections between neurons; these may then 
strengthen on arousal. The current experiment, 
however, shows a more-extensive alteration 
in neuronal connectivity in laboratory mice. 
It would be useful to see how these synap- 
tic changes alter the functioning of neuronal 
circuits in the hippocampus and other brain 


regions during hibernation (Fig. 1). 

The fact that plasticity and recovery of 
synaptic connections is missing when RBM3 
levels do not rise led Peretti and colleagues to 
investigate whether artificially boosting levels 
of this molecule would protect against neuro- 
degenerative disease. They therefore subjected 
prion-infected mice to two hypothermia treat- 
ments spaced one week apart. This caused 
RBM3 levels to remain raised for as long as 
eight weeks in young mice, and resulted in no 
synaptic or neuronal loss, or deficits in behav- 
iour, well into what would normally be the 
terminal phase of the disease. Protection did not 
occur in older mice, nor, perhaps more impor- 
tantly, when the researchers inhibited RBM3. 
The latter result directly implicates the RBM3 
protein, rather than some other factor of the 
hypothermia treatment, in neuroprotection. 

The authors then left the hypothermia 
approach behind, and instead merely raised 
the levels of RBM3 by injecting prion-infected 
mice with a viral construct containing this pro- 
tein, causing cells to overproduce it. This again 
imparted neuroprotection and suppression of 
behavioural deficits. 

Peretti and colleagues’ work points to one 
way in which the progression of devastating 
degenerative diseases might be slowed or 
halted. But it also indicates that tight control 
of protein-synthesis pathways is key for main- 
taining neuronal circuitry in healthy brains. 
The authors found that when they decreased 
the levels of RBM3 in control animals, the 
density of synapses also fell. Although this 
result is given only cursory mention in the 
paper, it suggests that RBM3, which is present 
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Figure 1 | A hibernating hazel dormouse. Peretti et al.’ show that a laboratory-based model of hibernation reveals extensive remodelling of synaptic connections, 


at high levels when neuronal circuits are 
being laid down during brain development, 
is also involved in their maintenance later 
on. Because some synapses in the adult brain 
are constantly being lost and re-formed, this 
is perhaps not surprising. Synaptic turnover 
is crucial to the ability of neuronal circuits to 
modify their function in response to changes 
in activity’. This raises a question about the 
extent to which molecules such as RBM3 are 
involved in adult brain plasticity. 

The authors’ work also highlights the need 
for a more-complete picture of the complex 
mechanisms of neuronal protein production, 
particularly at the synapse. This could have far- 
reaching consequences, not only for how we 
tackle neurodegenerative diseases and brain 
lesions caused, for example, by stroke, but also 
for our understanding of, and approach to, 
disorders of brain plasticity such as ageing and 
mental retardation. = 
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From quantum matter to high-temperature 
superconductivity in copper oxides 


B. Keimer', S. A. Kivelson’, M. R. Norman’, S. Uchida* & J. Zaanen® 


The discovery of high-temperature superconductivity in the copper oxides in 1986 triggered a huge amount of 
innovative scientific inquiry. In the almost three decades since, much has been learned about the novel forms of 
quantum matter that are exhibited in these strongly correlated electron systems. A qualitative understanding of the 
nature of the superconducting state itself has been achieved. However, unresolved issues include the astonishing 
complexity of the phase diagram, the unprecedented prominence of various forms of collective fluctuations, and the 
simplicity and insensitivity to material details of the ‘normal’ state at elevated temperatures. 


copper oxide perovskite Laz — .Ba,CuOy (ref. 1) ranks among the 

major scientific events of the twentieth century. The superconducting 
transition temperatures in the copper oxides greatly exceed those of any 
previously known superconductor by almost an order of magnitude; in 
1986 the highest possible temperature at which superconductivity could 
survive was widely believed to be 30 K (Fig. 1). Moreover, according to 
the theory of ‘conventional’ superconductors, the copper oxides would 
have seemed the least likely materials in which to look for superconduct- 
ivity: at room temperature they are such poor conductors that they can 
hardly be classified as metals and, indeed, if their chemical composition 
is very slightly altered they become highly insulating antiferromagnets. 
Magnetism arises from strong repulsive interactions between electrons, 
whereas conventional superconductivity arises from induced attractive 
interactions, making magnetism and superconductivity seemingly anti- 
thetical forms of order. 

The Bardeen-Cooper-Schrieffer (BCS) theory’ of the late 1950s pro- 
vided an extremely successful framework within which to understand con- 
ventional superconductors, and gave rise to conceptual breakthroughs. 
The basic insight is that the electrons collectively bind into ‘Cooper’ pairs 
and simultaneously condense in much the same way as bosons condense 
into a superfluid state. Fundamental to the BCS mechanism is the fact that, 
despite the strong direct Coulomb repulsions, the relatively weak attrac- 
tions between electrons induced by the coupling to the vibrations of the 
lattice (phonons) can bind the electrons into pairs at energies smaller than 
the typical phonon energy. This was widely believed to imply that the su- 
perconducting transition temperature T. of conventional superconduc- 
tors could never exceed 30 K (ref. 3), although this limit has been revised 
upwards by the discovery in 2001 of superconductivity with T-. = 39 Kin 
the simple metal MgB, (ref. 4), where circumstances conspire to optim- 
ize the electron-phonon mechanism. However, this is still far below the 
maximum T’ of the copper oxides. 

As the properties of the copper oxides were studied with ever-increas- 
ing precision and sensitivity, it became clear that much of the well-understood 
quantum theory of the electronic properties of solids, which has been 
spectacularly successful in accounting for the properties of conventional 
metals and superconductors, fails entirely to address many features of the 
copper oxides and, more generally, of a broad array of ‘highly correlated 
electron systems’ of which the copper oxides are the most studied. (A sche- 
matic phase diagram of the copper oxides is shown in Fig. 2.) 


a he discovery of high-temperature superconductivity in the 


Most prominently, at temperatures well above T, the conductivity in 
the copper oxides is almost two orders of magnitude smaller than in sim- 
ple metals and exhibits frequency and temperature dependences that are 
incompatible with the conventional theory of metals; this has led to mate- 
rials in the regime above T- being referred to as ‘strange metals’ or ‘bad 
metals’. The behaviour exhibited by these ‘strange metals’, much of which 
is simple to describe in terms of the so-called “marginal-Fermi-liquid 
phenomenology”, has resisted any generally accepted understanding. On 
the other hand, similar behaviour has now been documented in a large 
number of electronically interesting materials®, indicating that this is a 
general property of strongly correlated electron systems, and is not di- 
rectly linked to high-temperature superconductivity. We consider this to 
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Figure 1| T. versus time. Superconducting transition temperatures versus 
year of discovery for various classes of superconductors. The images on the 
right are the crystal structures of representative materials. The established 
record for conventional electron-phonon superconductors (yellow) is 39 K 
in MgB). Given the small Fermi energies, the T- values found in the family 
of heavy fermion superconductors (green) are actually remarkably high. 
There has been much interest in recent years in the new family of ‘iron 
superconductors’ (purple) in which T, values approach 60 K. The record 
holders are found in the copper oxide family (red), with a maximum T, of 165 K 
found in a ‘mercury’ copper oxide under pressure (dashed red line). 
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Figure 2 | Phase diagram. Temperature versus hole doping level for the 
copper oxides, indicating where various phases occur. The subscript ‘onset’ 
marks the temperature at which the precursor order or fluctuations become 
apparent. Ts onset (dotted green line), Te, onset aNd Tse, onset (dotted red line for 
both) refer to the onset temperatures of spin-, charge and superconducting 
fluctuations, while T* indicates the temperature where the crossover to the 
pseudogap regime occurs. The blue and green regions indicate fully developed 
antiferromagnetic order (AF) and d-wave superconducting order (d-SC) 
setting in at the Néel and superconducting transition temperatures Ty and T,, 
respectively. The red striped area indicates the presence of fully developed 
charge order setting in at Tcpw. Tspw represents the same for incommensurate 
spin density wave order. Quantum critical points for superconductivity and 
charge order are indicated by the arrows. 


be the most important open problem in the understanding of quantum 
materials, and it is here that radically new ideas, including those derived 
from recently developed non-perturbative studies in string theory, may 
be useful. 

More unique to the copper oxides is the behaviour observed in a range 
of temperatures immediately above T, in what is referred to as the 
‘pseudogap’ regime. It is characterized by a substantial suppression of the 
electronic density of states at low energies that cannot be simply related to 
the occurrence of any form of broken symmetry. Although much about 
this regime is still unclear, convincing experimental evidence has recently 
emerged that there are strong and ubiquitous tendencies towards several 
sorts of order or incipient order, including various forms of charge- 
density-wave, spin-density-wave, and electron-nematic order. There is 
also suggestive, but far from definitive, evidence of several sorts of novel 
order—that is, never before documented patterns of broken symmetry— 
including orbital loop current order and a spatially modulated super- 
conducting phase referred to as a ‘pair-density wave’. There are many 
fascinating aspects of these ‘intertwined orders’ that remain to be under- 
stood, but their existence and many aspects of their general structure were 
anticipated by theory’. Superconducting fluctuations also have an important 
role in part of this regime, although to an extent that is still much debated. 

The high-temperature superconducting phase itself has a pattern of 
broken symmetry that is distinct from that of conventional superconduc- 
tors. Unlike in conventional s-wave superconductors, the superconduct- 
ing wavefunction in the copper oxides has d-wave symmetry*”, that is, it 
changes sign upon rotation by 90°. Associated with this ‘unconventional 
pairing’ is the existence of zero energy (gapless) quasiparticle excitations 
at the lowest temperatures, which make even the thermodynamic prop- 
erties entirely distinct from those of conventional superconductors (which 
are fully gapped). The reasons for this, and its relation to a proximate anti- 
ferromagnetic phase, are now well understood, and indeed were also anti- 
cipated early on by some theories'*"'”. However, while various attempts 
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to obtain a semiquantitative estimate of T. have had some success”’, there 
are important reasons to consider this problem still substantially unsolved. 


Highly correlated electrons in the copper oxides 


The chemistry of the copper oxides amplifies the Coulomb repulsions 
between electrons. The two-dimensional copper oxide layers (Fig. 3) are 
separated by ionic, electronically inert, buffer layers. The stoichiometric 
‘parent’ compound (Fig. 2, zero doping) has an odd-integer number of 
electrons per CuO, unit cell (Fig. 3). The states formed in the CuO, unit 
cells are sufficiently well localized that, as would be the case in a collec- 
tion of well-separated atoms, it takes a large energy (the Hubbard U) to 
remove an electron from one site and add it to another. This effect pro- 
duces a ‘traffic jam’ of electrons”. An insulator produced by this classical 
jamming effect is referred to as a “Mott insulator”’*. However, even a 
localized electron has a spin whose orientation remains a dynamical degree 
of freedom. Virtual hopping of these electrons produces, via the Pauli 
exclusion principle, an antiferromagnetic interaction between neighbour- 
ing spins. This, in turn, leads to a simple (Néel) ordered phase below room 
temperature, in which there are static magnetic moments on the Cu sites 
with a direction that reverses from one Cu to the next'®””. 

The Cu-O planes are ‘doped’ by changing the chemical makeup of 
interleaved ‘charge-reservoir layers so that electrons are removed (hole- 
doped) or added (electron-doped) to the copper oxide planes (see the 
horizontal axis of Fig. 2). In the interest of brevity, we will confine our 
discussion to hole-doped systems. Hole doping rapidly suppresses the 
antiferromagnetic order. At a critical doping of pin, superconductivity 
sets in, with a transition temperature that grows to a maximum at Popw 
then declines for higher dopings and vanishes for Pmax (Fig. 2). Materials 
with p < Pop: are referred to as underdoped and those with poy. <p are 
referred to as overdoped. 

It is important to recognize that the strong electron repulsions that 
cause the undoped system to be an insulator (with an energy gap of 2 eV) 
are still the dominant microscopic interactions, even in optimally doped 
copper oxide superconductors. This has several general consequences. The 
resulting electron fluid is ‘highly correlated’, in the sense that for an elec- 
tron to move through the crystal, other electrons must shift to get out of 
its way. In contrast, in the Fermi liquid description of simple metals, the 
quasiparticles (which can be thought of as ‘dressed’ electrons) propagate 
freely through an effective medium defined by the rest of the electrons. 
The failure of the quasiparticle paradigm is most acute in the ‘strange metal’ 
regime, that is, the ‘normal’ state out of which the pseudogap and the 
superconducting phases emerge when the temperature is lowered. None- 
theless, in some cases, despite the strong correlations, an emergent Fermi 
liquid arises at low temperatures. This is especially clear in the overdoped 
regime (Fig. 2). But recently it has been shown that even in underdoped 
materials, at temperatures low enough to quench superconductivity by 
the application ofa high magnetic field, emergent Fermi liquid behaviour 
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Figure 3 | Crystal structure. Layered copper oxides are composed of CuO, 
planes, typically separated by insulating spacer layers. The electronic structure 
of these planes primarily involves hybridization of a 3d,2 _ ,2 hole on the 
copper sites with planar-coordinated 2p, and 2p, oxygen orbitals. 
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arises, albeit with characteristics (for example, a reconstructed Fermi 
surface) that are quite different from those predicted by band theory’*””. 
Nevertheless, over most of the phase diagram, the frustration of the co- 
herent electron motion produces physics that is qualitatively distinct from 
that of simple metals. 

Although the large zero-point energy of electrons in a usual metal re- 
sults in a quantum ‘rigidity’ that greatly suppresses all forms of inhomo- 
geneous states, the Mott physics and the short-range antiferromagnetic 
correlations inherited from the undoped ‘parent’ compound combine to 
produce a local tendency to phase separation and various forms of order, 
which spontaneously break the translational symmetry of the underlying 
crystal”® ”. Thus, especially in the pseudogap regime of the phase diagram, 
it is unsurprising that various forms of order occur on intermediate length 
scales. 


Pairing in an unconventional superconductor 


It is now well established that electrons can form pairs, even when they 
repel each other at a microscopic scale. However, this involves non-trivial 
physics. A model that is often used as a point of departure for theoretical 
discussions is the famous Hubbard model, describing electrons hopping 
on a lattice parametrized in terms of the bandwidth W = 8t (where f is a 
measure of the ‘hopping’ energy gain due to delocalization of the electrons) 
and an on-site electron-electron repulsion U. In the copper oxides, Uand 
W are comparable. Even for this simplified model, analytic solutions are 
not available. However, approximate solutions of the doped Hubbard 
model can be obtained in several ways, and these invariably point to a 
d-wave superconducting ground state. 

An intuitive understanding of the mechanism of pairing is best ob- 
tained by approaching the problem from an unrealistic weak-coupling 
perspective, that is, assuming U< W (ref. 23). Here, the gap structure is 
determined by the solution of a variant of the original BCS equations, in 
which an appropriately renormalized two-particle vertex function, I(k), 
plays the part of an effective interaction. For the case of purely repulsive 
interactions, if I is sufficiently k-dependent, a sign-changing supercon- 
ducting order parameter (where A(k) and 4(k + Q) have opposite sign) 
results for which interactions involving small momentum transfer are pair- 
breaking, and those with large momentum transfer near Q promote pairing. 
In particular, if there are antiferromagnetic correlations, this typically 
implies a peak in Jat the antiferromagnetic ordering vector, Q= Qar 
(ref. 24), which is also an ideal vector for scattering between ‘antinodal’ 
regions of the Fermi surface of the copper oxides shown in Fig. 4; that is, 
precisely those regions where the d-wave gap is largest and of opposite 
sign. The gap ‘nodes’ along the diagonals of the Brillouin zone are then, 
in turn, where the d-wave gap vanishes. 

Superconductivity in the Hubbard model cannot truly be approached 
from the strong-coupling limit, since there is now strong numerical evi- 
dence that for a broad range of doping, the ground state of the Hubbard 
model is ferromagnetic rather than superconducting for large enough U/t 
(ref. 25). However, the closely related t-J model (with J = 4t’/ Ubeing the 
superexchange interaction between copper spins mediated by the inter- 
vening oxygen ions) incorporates the essence of the strong-coupling phys- 
ics through the constraint that no more than one electron at a time can 
occupy a given site. The t-J model can then be addressed, with values of 
J/t ~ 0.5, as a reasonable model in its own right. Although no controlled 
solution is known, the superconducting tendencies of this model have been 
investigated numerically since the early days of high-temperature super- 
conductivity research**”. It is striking that the character and symmetry of 
the superconducting state itself and its association with short-range anti- 
ferromagnetic correlations look grossly similar, regardless of perspective”. 

Although intermediate coupling problems have thus far not been suc- 
cessfully solved by controlled analytic approaches, the lack of any small 
dimensionless parameters probably implies the lack of any long emer- 
gent length scales in the problem, except near a quantum critical point 
(QCP). With this in mind, a variety of numerical techniques have been 
developed to study this regime, including exact diagonalization (limited 
to small clusters), quantum Monte Carlo and its derivatives (variational 
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Figure 4 | Fermi surface, Fermi arcs and gap functions. The large Fermi 
surface predicted by band theory is observed by ARPES and STS for overdoped 
compounds (bottom right). But once the pseudogap sets in, the antinodal 
regions of the Fermi surface near the Brillouin zone edge are gapped out, giving 
rise to Fermi arcs (top right). This is reflected (left) in the angle dependence 
of the energy E of the superconducting gap 4cc (blue line) and pseudogap 
Apg (red line) as functions of the momenta k, and k, in one quadrant of the 
Brillioun zone around the underlying large Fermi surface (dashed curve), 

as revealed by ARPES and STS. Note the gapless region around the d-wave 
superconducting node for the pseudogap case that defines the Fermi arcs. 
These arcs appear to be reconstructed into electron pockets centred at 

(Q/2, Q/2) once charge order sets in, as revealed by quantum oscillation 
studies, where (Q, 0) is the charge order wavevector”’. 


and fixed node approximations to get around the issue of negative pro- 
babilities in fermion simulations), dynamical mean field theory” and its 
cluster generalizations (either in momentum space or real space), density 
matrix renormalization group (designed for one-dimensional problems 
but can simulate strips), and its two-dimensional generalizations”. These 
methods all have their pros and cons. They have, however, taught us that 
if superconductivity occurs, it is invariably of d-wave symmetry, but also 
that many competing states are close in energy, especially unidirectional 


charge order*®”’. 


High-T, superconductivity 
Of course, there is additional complexity in going from theoretical results 
for simple model problems to the real experimental systems. Static 
antiferromagnetism disappears quickly as a function of doping (Fig. 2), 
but both inelastic neutron scattering and resonant inelastic X-ray scattering 
reveal that the antiferromagnetism of the insulator survives in the super- 
conductor to a degree in the form of dynamical magnetic fluctuations which 
are much stronger than in conventional metals (and are strongly renor- 
malized when cooling below T,)**». It is physically appealing to use the 
measured spin fluctuation spectrum as an approximation of the vertex ” 
mentioned above. This yields reasonable values for T, and also appears to 
be consistent with some of the single electron self-energy effects detected 
by various electron spectroscopies such as ARPES (angle resolved photo- 
emission spectroscopy) and STS (scanning tunnelling spectroscopy)'*”®. 
However, this approach, departing from the idea of the attractive force 
(‘glue’) induced by spin fluctuations, has shortcomings”. Despite its intu- 
itive appeal, it is not based on controlled mathematics, since the same 
electrons that are pairing also form the ‘glue’. Another difficulty is that these 
simplified models leave out other effects that can influence the magnitude 
of T.. A case in point is the electron-phonon interaction. There is good 
evidence that phonons affect both ARPES and STS line shapes**, while 
strong anomalies are seen in the phonon spectra’’. There are a number of 
other neglected effects that are worrisome, in particular the non-local Cou- 
lomb interaction which is an especially relevant concern given the poor 
screening in the direction perpendicular to the planes*’. 
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More seriously, some fundamental aspects of high-T, superconduct- 
ivity are qualitatively different from the BCS variety. An example is the 
influence of quenched disorder. In absolute terms, most copper oxides can 
be regarded as chemically ‘dirty’, owing to their doped (non-stoichiometric) 
nature. In BCS theory, an important difference between s-wave and 
unconventional superconductivity is that the former is relatively imper- 
vious to structural disorder, while the latter is readily degraded*’. The 
strong inhomogeneity seen by STS leaves no doubt that many copper 
oxide superconductors reside in a disordered lattice potential, but the 
d-wave superconductivity appears to be fairly insensitive to this adverse 
condition. It has been suggested that this is a consequence of strong local 
correlations”, but it still remains a puzzle. 

A very basic quantity for the superconducting order is the superfluid 
density p,, the quantity that parametrizes the rigidity of the phase of the 
superconducting order parameter, which also determines the capacity of 
the superconductor to expel electromagnetic fields. One can identify a 
temperature associated with the fluctuations of the phase Ty ~ p,/m* where 
m* is the effective mass. In a BCS superconductor, p, is equal to the total 
density of electrons at zero temperature and accordingly Ty > T. ~ 0.5Ao/kp; 
the temperature associated with the formation of the pairs where A is the 
average superconducting gap (kg is the Boltzmann constant). The fluc- 
tuations of the phase of the superconducting condensate are largely irrel- 
evant; once Cooper pairs form, they automatically condense. Turning to 
the copper oxides, it was established early on that the superfluid density is 
anomalously small, scaling in the underdoped regime with T, (the “Uemura 
law’). The conclusion is that in the underdoped copper oxides, Ty and 
the pair-binding energy are of the same order and the thermal fluctua- 
tions of the phase should be crucial for the thermodynamics“. A long- 
standing question is whether, perhaps, pairs already form at the (very high) 
pseudogap temperature T* (Fig. 2), while at a much lower temperature, the 
actual T., the phase locks to form the long-range ordered superconduct- 
ing state. As we will see, the physics of the phase fluctuations is inter- 
twined with that of competing order. 

In many cases, in copper oxides with the highest T, (that is, at optimal 
doping), the superconductivity emerges directly as an instability of the 
strange metal phase. The strange metal is the least understood part of the 
phase diagram, because it does not appear to be describable in terms of 
weakly interacting Landau quasiparticles. The very non-BCS transition 
from the strange metal to the more conventional physics of the super- 
conducting state is vividly apparent in the temperature evolution of the 
ARPES spectra at momenta near the ‘antinodes’ (Fig. 4) where the 
pairing forces of the d-wave superconductor are supposedly the stron- 
gest. For these momenta, the electron spectral function is strongly broa- 
dened as a function of energy*’. Upon entering the superconducting phase, 
a quasiparticle peak starts to develop that has the classic backbending 
(Bogoliubov) dispersion of a BCS superconductor**. This is, in turn, con- 
sistent with the onset of coherence observed in microwave, infrared and 
thermal conductivities. But, unlike in BCS theory, it appears that the spec- 
tral weight of this antinodal “Bogoliubon” is linearly proportional to the 
superfluid density, both as a function of doping and temperature*”*. This 
is not understood: it is as if the phase coherence of the superconductor 
‘freezes out’ single-particle coherence from the highly collective non- 
Fermi-liquid strange metal state. 


Pseudogap regime 
A prominent feature of this regime of the phase diagram is the line T*, 
which denotes the onset of a partial gap observed in spectroscopic data. 
First inferred from nuclear magnetic resonance measurements that showed 
a reduction in the low-frequency spin excitations*””°, this pseudogap was 
subsequently seen in c-axis polarized infrared conductivity measurements 
and is associated with a pronounced upturn in the c-axis resistivity*’. In 
contrast, the in-plane polarized infrared conductivity indicates a drop 
in the scattering rate’, which is reflected in a reduction of the planar 
resistivity?**. 

A much debated question is whether conventional (Hartree-Fock) mean- 
field treatments are able to provide even a qualitatively correct account of 
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the pseudogap phenomenology. There are surely issues related to the 
‘plethora of orders’ discussed immediately below, something which is not 
easy to understand in the conventional way. However, one obtains a 
much sharper view using electron spectroscopies. The striking difference 
in the nature of the electronic excitations measured in ARPES when cross- 
ing from the ‘coherent’ nodal region to the ‘incoherent’ antinodal region 
in momentum space is called the nodal-antinodal dichotomy”’. The nodal 
region involves a narrow region around the zone diagonals, which gradu- 
ally grows with increasing doping until it encompasses the entire Fermi 
surface in sufficiently overdoped materials. While the antinodal region 
lacks any quasiparticle-like spectral peaks, throughout the pseudogap 
regime it exhibits a suppression of low-energy electronic spectral weight 
on an energy scale that corresponds to the pseudogap*. 

The astonishing character of these observations is best illustrated by 
showing a map of the spectral weight at low energy as a function of kin 
the first Brillouin zone (Fig. 4). In a Fermi liquid, the Fermi surface de- 
lineates the boundary between occupied and unoccupied quasiparticle 
states, so no matter how complicated it may be, the one thing it cannot 
do is abruptly end. However, in the pseudogap regime, there appear to be 
“Fermi arcs” in the nodal regime”*. In a mean-field theory, the effective 
potential associated with a (density-wave) state that breaks translational 
symmetry can reconstruct a large Fermi surface, producing small Fermi 
surface pockets, but these must still form connected manifolds. It is plau- 
sible that the Fermi arcs are actually the front half of such a pocket*”**, 
and hence there has been an intense search to find the “backside of the 
pocket”, but at present there is no definitive sign of it. 

STS has proved particularly revealing in this context. Such data (mostly 
below T,) exhibit electronic waves in real space that upon Fourier trans- 
formation show peaks that disperse with bias and have been mapped to 
scattering across the Fermi surface”. One finds that in the superconduct- 
ing state, the low-energy excitations near the nodes behave just as one 
would expect for a BCS d-wave state, but at higher energies, cross over to 
a dispersionless pattern characteristic of short-range stripe order. Inter- 
estingly, this low-energy dispersing pattern maps out the same Fermi arc*" 
in momentum space (Fig. 4) as observed directly by ARPES* in the pseu- 
dogap state, with the arc recovering the full Fermi surface once the doping 
exceeds a critical value. However, the STS result, indicating a complete 
loss of coherence in the antinodal regime, appears to be inconsistent with 
the ARPES, which sees antinodal quasiparticles below T,, even for under- 
doped materials’. The higher-energy dispersionless pattern is seen at all 
energies when moving above T- into the pseudogap state**®*, consistent 
with local charge order, and has been identified as coexisting with the low- 
energy dispersing signal below T. as well®**’. But the consistency of the 
ARPES data with charge order is still an active area of debate™—to date, 
no unambiguous signatures associated with the stripe wavevectors have 
been found. 


Precursor pairing 

The structure of the pseudogap in momentum space was directly mapped 
by ARPES experiments at temperatures between T* and T., and found to 
crudely mimic the d-wave superconducting gap: the pseudogap is appar- 
ent only in the ‘antinodal’ regions of the Brillouin zone (Fig. 4)°””° where 
the d-wave gap is largest. This immediately suggests that at the very high 
pseudogap temperature T™, pairs already start to form, while phase fluc- 
tuations prohibit superconducting order until much lower temperatures 
are reached. So long as there is substantial short-range phase coherence, 
superconducting fluctuations should have large and identifiable signa- 
tures. For instance, for a range of temperatures that extends up to about 
1.5T, (but not to temperatures comparable to T*), large fluctuation con- 
ductivity (at both direct and alternating current) is observed”', but there is 
some debate about how much such signatures differ from those observed 
in classic superconductors. 

The more far-reaching notion of pairing correlations without substan- 
tial phase coherence persisting to temperatures of the order of T* is dif- 
ficult to define precisely, even in principle. The best circumstantial evidence 
comes from diamagnetism, which is observed up to about 150 K (ref. 72). 
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Though weak compared to full Meissner screening, it is still large com- 
pared to that of simple metals. Moreover, in underdoped YBa,Cu3O¢ + x 
(YBCO), a moderately well-defined interlayer Josephson plasma resonance 
seems to persist up to similar temperatures”, and recent pump-probe ex- 
periments are consistent with transient superconducting order existing 
all the way to T* (ref. 74). Perhaps the most compelling evidence is in the 
temperature evolution of the gap itself; the nodal-antinodal dichotomy 
notwithstanding, the pseudogap above T- evolves remarkably smoothly 
into the gap of the superconducting state well below T.. In that context, it 
has been suggested by ARPES that the Fermi arc is simply due to the life- 
time broadening of a d-wave node’*”*, and this has been inferred as well 
by STS below T, (ref. 77). The reconciliation of this superconducting-like 
signature in the fermion response in the pseudogap phase and an energy 
gap due to competing order”* has been a major challenge, even more re- 
levant now, given the new findings of such crystalline order (see below). 


Competing orders 

Another increasingly well-documented feature of the pseudogap regime 
is a tendency towards a variety of orders (that is, patterns of broken sym- 
metry) in addition to superconductivity. Some involve ‘crystallization’ 
of the electrons in the form of stripes and other forms of charge order, 
but others appear to be more novel quantum liquids. There is also some 
evidence for new types of order involving various patterns of equilibrium 
orbital currents, and possibly a new sort of spatially modulated super- 
conducting state. 

Neutron scattering studies in the mid-1990s led to the experimental 
discovery of electronic ‘stripes’ in the Laz — ,Sr,CuO, (LSCO) family”. 
These studies were inspired, in part, by earlier mean-field theories of 
density-wave formation in lightly doped Hubbard-like models”®. An alter- 
native and complementary theoretical perspective was based on the ob- 
servation that doping an insulating antiferromagnet produces a tendency 
to phase separation that is frustrated by the long-range Coulomb inter- 
action; the compromise is to form conducting stripe-like textures”'. The 
stripe order, characterized by incommensurate antiferromagnetic order 
and charge segregation, was initially found in underdoped versions of LSCO, 
where a low-temperature tetragonal lattice deformation apparently acts 
as a pinning potential for the stripes””. However, it became clear that these 
stripes were different from the ‘classical’ stripes in the other doped Mott 
insulators: the copper oxide stripes stay metallic and even superconduct 
at low temperatures. Although the spatial organization looks similar to 
the mean field stripes, a crucial difference is that these can now be viewed 
as a partially crystallized superconductor, formed from electron pairs*’”. 

In general terms, a competition between superconductivity and crys- 
tallization is a very natural way to account for the diminishing superfluid 
density in the pseudogap regime. Indeed, quite recently, evidence has em- 
erged that materials with static stripes form a “pair density wave”: the 
charge stripes are internally superconducting, but the phase reverses from 
stripe to stripe’. Given that the stripe orientation changes as one moves 
from one layer to the next, this frustrates the Josephson coupling between 
layers, entirely extinguishing the superfluid stiffness perpendicular to the 
planes and thus giving rise to a two-dimensional superconducting state 
consistent with transport measurements**™’. If confirmed, this would con- 
stitute the discovery of a new phase of matter. 

It was subsequently found, by inelastic neutron scattering, that the mag- 
netic spectrum of the ordered stripes has a unique ‘hourglass’ pattern’, 
with the neck of the hourglass located at the commensurate antiferro- 
magnetic wavevector, and this pattern has subsequently been observed in 
the insulating charge ordered states of manganites and cobaltates*’. 
Although this makes it natural to associate the inelastic neutron scattering 
spectrum with the spin waves associated with incommensurate magnetic 
order, in the copper oxides, this pattern persists for larger dopings, even 
where the stripe order is no longer condensed and where the character of 
the spin-wave spectrum changes dramatically when the temperature drops 
below T, (ref. 84). In this regime, many salient features of the magnetic spec- 
trum are similar to what is expected for weakly interacting quasiparticles 
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in a homogeneous d-wave superconductor®. A reconciliation of these two 
very different pictures remains a challenge for the field. 

For many years, static stripe order had seemed to be confined to the 
LSCO family. However, recently charge ordering was discovered in 
underdoped YBCO* and Bi- and Hg-based copper oxides®’**’. X-ray 
experiments find short-range incommensurate charge order that gradu- 
ally sets in between 100 K and 200 K (refs 88 and 89). Moreover, high- 
resolution X-ray scattering” and nuclear magnetic resonance experiments 
have confirmed that the short-range charge order is truly static, and thus 
presumably arises from pinning of correlated charge fluctuations by defects”’. 
Unlike in the stripes of the LSCO family, there is no evidence of coinci- 
dent static (or nearly static) magnetic order. Moreover, the variation of the 
stripe wavevector with doping in YBCO” and Bi-based copper oxides*”** 
is much weaker and has the opposite sign of that in LSCO. Whereas in 
LSCO, this wavevector increases with doping as would be expected in a 
real space picture, in YBCO and the Bi-based copper oxides, the wave- 
vector decreases, as would be expected from a momentum space picture 
involving vectors spanning the Fermi surface. This difference may be 
connected with differences in the spin behaviour: in YBCO a large spin 
gap is present that acts to suppress the incommensurate spin order that is 
more prevalent in the LSCO family**”’. 

Yet another interesting hint regarding the unusual relationship between 
the charge order and superconductivity follows from the temperature evo- 
lution of the charge order. The X-ray signal begins to build up smoothly 
upon cooling below a characteristic charge ordering temperature (Tcpw) 
typically less than T*, to attain a maximum at the superconducting T,, then 
drops noticeably below T., indicating competition between the charge 
order and superconductivity*””. 

There is also evidence for ‘quantum nematic liquid crystal’ order oc- 
curring in the pseudogap phase. Such phases are translationally invariant 
but break point group (for example, rotational) symmetries. First sug- 
gested in the context of the quantum melting of stripe crystals”, evidence 
appeared for a phase breaking the fourfold symmetry of the square lattice 
in underdoped YBCO, from transport”*”* and inelastic neutron scatter- 
ing measurements”. This ‘nematic’ signal was also found in an analysis 
of the STS data, showing that besides the ‘stripy’ texture breaking trans- 
lations, there is also an overall (zero wavevector) breaking of rotations 
present, consistent with the two oxygen ions in the CuO, unit becoming 
inequivalent’’”®, though this is controversial”. 

These orders are all close siblings of the electron crystal. However, 
there is also evidence for a completely different kind of order below T* as 
well. This order is symmetry-wise equivalent to having magnetic mo- 
ments on the oxygen sites, and thus would be a magnetic analogue of the 
charge nematic mentioned above’. But the original proposal that moti- 
vated the experiments involved spontaneous electron currents flowing 
inside the CuO, units in such a way that although rotational symmetry is 
broken, translational symmetry is not’”’. It has a quite distinct magnetic 
diffraction pattern, which was subsequently seen by spin-flip neutron 
scattering'®’. If confirmed, this would again amount to the discovery ofa 
new phase of matter, though it does not yield a natural explanation for 
the pseudogap, just by the very fact that it does not break translational 
symmetry. The real difficulty with this proposal is that current order 
should also be seen by local magnetic probes such as muon spin reson- 
ance and nuclear magnetic resonance, but this has not been observed. 
Potentially related to this is the onset of a small Kerr rotation at T*, which 
also indicates some type of symmetry breaking”. This Kerr signal defines 
a phase line that cuts through the superconducting dome, vanishing near 
18% doping. 


The strange metal 

The strange metal regime was recognized early on as perhaps the most 
mysterious aspect of the copper oxide phase diagram. The most basic dif- 
ference between the strange metal and a conventional metal is the absence 
of quasiparticles. This has consequences for physical properties like the 
electrical resistivity. In a normal metal, unless the metal melts first, the re- 
sistivity saturates at high temperatures when the mean free path, |, becomes 
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of the order of the electron de Broglie wavelength, /. The resistivity of the 
copper oxide strange metal can be linear in T from near T, up to as high a 
temperature as measured'®’, even when the inferred / would be smaller 
than A, which would violate the Heisenberg uncertainty principle for the 
quasiparticles. Moreover, the Hall resistivity has a temperature depend- 
ence different from what would be expected in a quasiparticle picture™. 

In the late 1980s, some of these and various other experimental anom- 
alies were encapsulated in the phenomenological ‘marginal Fermi liquid’ 
theory”. This asserts that the Fermi gas is coupled to a continuum of ex- 
citations that is spatially featureless, with a spectral density which is con- 
stant for @ > T, but proportional to T for w < T. This leads to a damping 
rate that scales as max(«, T). This was confirmed later by high-resolution 
ARPES measurements, with the caveat that this is only seen in the nodal 
region, with the antinodal region behaving in a more incoherent fashion’. 

In the 1990s it was proposed that quantum criticality could explain the 
low-energy excitations of the strange metal. A quantum phase transition 
occurs when a continuous phase transition occurs at zero temperature as 
a function ofa tuning parameter (like pressure or doping), where the cor- 
responding QCP defines the boundary between the ordered (broken sym- 
metry) and disordered quantum phases’”*. The correlations at a QCP are 
characterized by a spatio-temporal scale invariance, which in turn has the 
effect that there are no longer quasiparticle poles (6-functions) in spectral 
functions. Instead one finds power-law behaviour (‘branch cuts’) and spec- 
tral functions at finite temperature that are scaling functions of w/T. This 
can be interpreted in terms of a dissipative energy relaxation time h/kgT, 
which is sometimes referred to as “Planckian dissipation” because it is 
a quantum effect independent of material parameters’. Moving away 
from the critical point, the energy scale above which scale invariance 
remains gradually increases. Accordingly, in the ‘tuning parameter’- 
temperature plane, there is a quantum critical wedge opening up from 
the QCP. This suggests an interpretation of the phase diagram in Fig. 2, 
where the strange metal is identified with the quantum critical wedge 
associated with a QCP under the superconducting dome near optimal 
doping. 

The theory of quantum criticality in metallic systems is still a work in 
progress. One issue is that there may be reasons to believe that the QCP 
is intrinsically unstable, since the order parameter fluctuations mediate 
attractive interactions that promote superconductivity, meaning that the 
QCP might always be ‘shielded’ by a superconducting dome, just as in 
Fig. 2. However, there is also typically a diverging correlation length at a 
QCP, while no such growing correlation length has yet been observed in 
the strange metal state of the copper oxides for any of the orders that are 
considered likely candidates. Moreover, according to the marginal Fermi 
liquid phenomenology’, what is needed is a special sort of quantum crit- 
icality that is local in space, and so featureless in k. 

Is there a QCP involving the termination of pseudogap order inside 
the superconducting dome? There is evidence for the termination of pseu- 
dogap order in a QCP from early specific heat data'®* and from a disper- 
sion anomaly seen in photoemission’ as well as a vanishing of the Kerr 
rotation’ and charge order”, with the latest being a divergence in the 
effective mass seen in quantum oscillation studies'’®. But which order param- 
eter rules the quantum critical regime'"', and is that regime large enough 
to encompass the entire strange metal region? We argued above that the 
pseudogap is characterized by several competing ordering tendencies. Even 
more seriously, this quantum critical description should break down at 
higher temperatures. But the strange metal persists to the highest attain- 
able temperatures. 

For a highly correlated fluid, the interactions are large and so probably 
cannot be treated using any fundamentally perturbative approach which 
starts with a free particle description. There is a well-developed and 
extremely successful theoretical solution of this problem applicable to 
one-dimensional and quasi-one-dimensional electron fluids based on 
‘bosonization’, but no such approach exists in higher dimensions. In this 
context, it is important to seek new approaches—theories that honestly 
treat the strong correlation physics—even if the connection to the relevant 
microscopic physics is unclear. This is where the mathematics of string 
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theory may help: with the so-called holographic duality, one can address 
the physics of strongly interacting finite density systems. Its central point, 
the mathematical anti-deSitter to conformal field theory (AdS/CFT) cor- 
respondence, has become a focus for modern string theoretical research. 
Discovered in 1997'”, it demonstrates that the two grand theories of phys- 
ics, which seem unrelated (general relativity and quantum field theory) 
can become under certain conditions two sides of the same coin. According 
to the correspondence, there is a ‘holographic’ relation in the sense that 
the quantum field theory is like a two-dimensional photographic plate 
with interference fringes encoding the gravitational physics in three di- 
mensions. Most importantly, the difficult-to-solve strong-coupling quan- 
tum field theory is mapped to its more easily solved holographic dual, the 
weak-coupling gravity theory. 

Since 2007, the properties of matter at finite density have been the cen- 
tral focus of this ‘holography’ research’’’. At low temperatures one finds 
superconductors, stripe and current phases, and even Fermi liquids. The 
observable responses of these states are often similar to experimental ob- 
servations. However, the great difference is in the nature of the strange 
metal at higher temperatures. The gravity dual tells us that these systems 
at finite fermion density should form metallic quantum critical phases, 
where the scale invariance emerges without fine tuning to any special QCP. 
However, these “conformal” metals (which exhibit Planckian dissipation) 
are intrinsically unstable, and upon cooling spawn an extensive manifold 
of stable states. They also have special scaling properties that are different 
from any conventional quantum critical state'™*. 

Owing to the limitations of the mathematics, holography can only be 
proved for certain field theories that have no resemblance whatsoever with 
the electrons in the copper oxides. It is not currently known whether the 
traits discussed above are ubiquitous emergent phenomena or somehow 
tied to these special cases. At the least, however, holography may supply 
powerful new metaphors, teaching physicists to think differently, and lead- 
ing to new experimental questions. 


The overdoped regime 


As the doping is increased beyond the doping optimal for the supercon- 
ductivity, it appears that a real Fermi liquid begins to be established: quan- 
tum oscillations indicate a well-developed large Fermi surface, consistent 
in detail with the prediction of one electron band theory'”’. This is sup- 
ported by ARPES measurements where now sharp peaks are observed 
near this Fermi surface throughout the Brillouin zone (including the 
antinodes)**"’*. Inelastic neutron scattering data indicate a dramatic sup- 
pression of magnetic spectral weight near the antiferromagnetic wave vec- 
tors, which may be interpreted as a disappearance of the spin-fluctuation 
pairing glue, explaining why T, goes down'’”. On the other hand, recent 
resonant inelastic X-ray scattering data have demonstrated pronounced 
spin fluctuations at smaller wave vectors, implying that strong electron 
correlations persist even in highly overdoped copper oxides**'*. A big 
question is how different the Fermi liquid at lower temperatures really is 
from the anomalous strange metal at higher temperatures. ARPES shows 
that there is only a weak crossover line that separates these two regimes'””. 


Outlook 


Originally inspired by the desire to find out why superconductivity can 
happen at a high temperature, condensed matter scientists engaged in a 
relentless effort to unravel the physics of copper oxides. As we have em- 
phasized, there is still plenty of work to do, especially with regards to the 
physics of competing order in the underdoped regime. The bottom line 
is that the existing theoretical machinery seems inadequate to describe both 
the rich physics of the pseudogap phase and the nature of the strange 
metal phase. 

Experimental techniques with which to control correlated electrons are 
evolving rapidly. Recent examples include the development of atomically 
precise layer-deposition methods that allow the tailoring of lattice struc- 
tures'”° and coherent optical control techniques”*. In another development, 
the practitioners of quantum information and string theory have landed 
in the same territory, finding to their surprise that they are struggling with 
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many of the same issues as condensed matter physicists. This is also 
reflected in the fact that some of the key underlying physics has been 
captured by advanced numerical techniques like the density matrix renor- 


malization group and its descendants 


2°, which were also motivated by 


quantum information theory. The jury is still out on whether this is a 
coincidence or signals the onset of a revolution in physics. 
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New genetic loci link adipose and insulin 
biology to body fat distribution 
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Body fat distribution is a heritable trait and a well-established predictor of adverse metabolic outcomes, independent of 
overall adiposity. To increase our understanding of the genetic basis of body fat distribution and its molecular links to 
cardiometabolic traits, here we conduct genome-wide association meta-analyses of traits related to waist and hip cir- 
cumferences in up to 224,459 individuals. We identify 49 loci (33 new) associated with waist-to-hip ratio adjusted for 
body mass index (BMI), and an additional 19 loci newly associated with related waist and hip circumference measures 
(P<5 x 1078). Intotal, 20 of the 49 waist-to-hip ratio adjusted for BMI loci show significant sexual dimorphism, 19 of which 
display a stronger effect in women. The identified loci were enriched for genes expressed in adipose tissue and for putative 
regulatory elements in adipocytes. Pathway analyses implicated adipogenesis, angiogenesis, transcriptional regulation and 
insulin resistance as processes affecting fat distribution, providing insight into potential pathophysiological mechanisms. 


Depot-specific accumulation of fat, particularly in the central abdomen, 
confers an increased risk of metabolic and cardiovascular diseases and 
mortality’. An easily accessible measure of body fat distribution is waist- 
to-hip ratio (WHR), a comparison of waist and hip circumferences. A 
larger WHR indicates more intra-abdominal fat deposition and is asso- 
ciated with higher risk for type 2 diabetes (T2D) and cardiovascular 
disease**. Conversely, a smaller WHR indicates greater gluteal fat accu- 
mulation and is associated with lower risk for T2D, hypertension, dys- 
lipidemia and mortality*°. Our previous genome-wide association study 
(GWAS) meta-analyses have identified loci for WHR after adjusting for 
body mass index (WHRadjBMI)”*. These loci are enriched for asso- 
ciation with other metabolic traits’* and show that different fat distri- 
bution patterns can have distinct genetic components””°. 

To determine further the genetic architecture of fat distribution and 
to increase our understanding of molecular connections with cardio- 
metabolic traits, we performed a meta-analysis of WHRadjBMI associ- 
ations in 142,762 individuals with GWAS data and 81,697 individuals 
genotyped with the Metabochip", all from the Genetic Investigation of 
ANthropometric Traits (GIANT) consortium. Given the marked sex- 
ual dimorphism previously observed among established WHRadjBMI 
loci”*, we performed analyses in men and women separately, the results 
of which were subsequently combined. To characterize the genetic deter- 
minants of specific aspects of body fat distribution more fully, we 
performed secondary GWAS meta-analyses for five additional traits: 
unadjusted WHR, unadjusted waist circumference, BMI-adjusted waist 
circumference (WCadjBMI), unadjusted hip circumference and BMI- 
adjusted hip circumference (HIPadjBMI). We evaluated the associated 
loci to understand their contributions to variation in fat distribution 
and adipose tissue biology, and their molecular links to cardiometa- 
bolic traits. 


New loci associated with WHRadjBMI 


We performed meta-analyses of GWAS of WHRadjBMI in up to 
142,762 individuals of European ancestry from 57 new or previously 
described GWAS’, and separately in up to an additional 67,326 Euro- 
pean ancestry individuals from 44 Metabochip studies (Extended Data 
Fig. 1 and Supplementary Tables 1-3). The combination of these two 
meta-analyses included up to 2,542,447 autosomal single nucleotide 
polymorphisms (SNPs) in up to 210,088 European ancestry individuals. 
We defined new loci based on genome-wide significant association 


(P<5%X10 “after genomic control correction at both the study-specific 
and meta-analytic levels) and distance (>500 kilobases (kb) from pre- 
viously established loci)”*. 

We identified 49 loci for WHRadjBMI, 33 of which were new and 
16 previously described”*. Of these, a European ancestry sex-combined 
analysis identified 39 loci, 24 of which were new’* (Table 1, Supplemen- 
tary Table 4 and Supplementary Figs 1-3). European ancestry sex- 
specific analyses identified nine additional loci, eight of which were 
new and significant in women but not in men (all Pyen > 0.05; Table 1 
and Supplementary Fig. 4). The addition of 14,371 individuals of non- 
European ancestry genotyped on the Metabochip identified one addi- 
tional locus in women (rs1534696, near SNX10, Pwomen = 2.1 X 1078, 
Pyen = 0.26, Table 1 and Supplementary Tables 1-3), with no evidence 
of heterogeneity across ancestries (Phet = 0.86; Supplementary Note). 


Genetic architecture of WHRadjBMI 


To evaluate sexual dimorphism, we compared sex-specific effect size 
estimates of the 49 WHRadjBMI lead SNPs. The effect estimates were 
significantly different (Puitierence < 0.05/49 = 0.001) at 20 SNPs, 19 of 
which showed larger effects in women (Table 1 and Extended Data 
Fig. 2a), similar to previous findings’*. The only SNP that showed a 
larger effect in men mapped near GDF5 (18224333, Pmen = 0.036 and 
P=9.0 X 107, Byomen = 0.009 and P = 0.074, Paifterence = 6.4 X 10°), 
a locus previously associated with height (rs6060369, r* = 0.96 and 
13143384, 7° = 0.96, 1000 Genomes Project CEU), although without 
significant differences between sexes’*’*. Consistent with the larger 
number of loci identified in women, variance component analyses dem- 
onstrated a significantly larger heritability (h*) of WHRadjBMI in 
women than men in the Framingham Heart Study (h’ women = 0-46; 
I men = 0.19, Paitforence — 0.0037) and TwinGene study (H women = 0.56, 
I men = 0.32, Paitference — 0.001; Supplementary Table 5 and Extended 
Data Fig. 2b). 

To identify multiple association signals within observed loci, we per- 
formed approximate conditional analyses of the sex-combined and 
sex-specific summary statistics using GCTA™ (Supplementary Note). 
Several signals (P< 5 X 10°) were identified at nine loci (Extended 
Data Table 1). Fitting SNPs jointly identified different lead SNPs in the 
sex-specific and sex-combined analyses. For example, the MAP3K1- 
ANKRDS55 locus showed near-independent (linkage disequilibrium 
(LD) 77 < 0.01) SNPs 54 kb apart that were significant only in women 
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Table 1 | WHRadjBMI loci in sex-combined and sex-specific meta-analyses 


Sex-combined Women Men Sex diff. 
SNP Chr Locus EA*  EAF B P N B P N B P N Py 
Novel loci achieving genome-wide significance in European-ancestry meta-analyses 
rs905938 = 1~—Ss«éDCST2 T 074 0025 7.3x10°!° 207,867 0.034 4.9x10-?° 115536 0.015 11x10? 92461 16x10? 
rs10919388 1  GORAB C 072 0.024 3.2x10°° 181,049 0.033 4.8x10-?° 102446 0.013 3.0x10°? 78,738 9.8x10-3 
rs1385167 2 + MEISI G 015 0.029 1.9x10-2 206619 0023 40x10 * 114668 0036 23x10’? 92085 1.6x101 
rs1569135 2 CALCRL A 0.53 0.021 5.6x1071° 209,906 0.023 69x10’ 116642 0.019 15x10 * 93,398 58x10! 
rs10804591 3. PLXNDI A 0.79 0.025 66x10° 209,921 0.040 6.1x10713 116667 0.004 53x10! 93,387 5.7x107® 
rs17451107 3. LEKRI T 061 0026 1.1x10-22 207,795 0.023 10x10 ° 115,735 0030 14x108 92194 35x101 
rs3805389 4 NMU A 028 0012 15x10? 209,218 0.027 46x10-® 16,226 -—0.007. 2.1x10°! 93,125 1.6x10~° 
rs9991328 4 FAM1I3A T O49 0.019 45x108 209,925 0.028 3.4x1071° 116652 0.007. 1.7x10! 93,407 8.5x1074 
rs303084 4 SPATA5- A 080 0.023 3.9x107® 209,941 0.029 34x10’ 116662 0.016 99x10°% 93,412 11x10! 
FGF2 
rs9687846 5 MAP3K1 A 0.19 0.024 7.1x10°8 208,181 0.041 3.8x10712 115,897 0.000 9.7x10°! 92,417 1.3x107® 
rs6556301 5 FGFR4 0.36 0.022 2.6x10-® 178874 0.018 7.1x10°* 101,638 0.029 1.0x10-® 77,370 1.4x107! 
rs7759742 6 BTNL2 A 051 0.023 4.4x10-™* 208,263 0.024 1.7x10°’ 115,648 0.023 55x10° 92,749 86x1071 
rs1776897. 6 HMGAI G 008 0.030 1.1x10°5 177,879 0.052 68x10-2 100,516 0.003 74x10! 77,497 1.8x10-* 
rs7801581 7  HOXA11 T 0.24 0.027. 3.7x1071° 195,215 0.025 7.7x10° 108866 0.029 24x10 ° 86483 69x10! 
rs7830933. 8 NKX2-6 A 077 0022 74x108 209,766 0.037 1.2x10-?2 116567 0.001 84x10! 93,333 1.4x10- 
rs12679556 8 MSC G 025 0.027 2.1x10774 203826 0.033 2.1x10°'!° 114369 0.017 4.2x10°3 89,591 28x10? 
rs10991437 9 ABCAI A 011 0031 1.0x10-® 209941 0.040 28x10 116644 0022 61x10% 93,430 7.2x10~? 
rs7917772 10 SFXN2 A 062 0014 56x10°5 209,642 0.027 5.5x10-° 16,514 -0.001 86x10! 93,263 2.3x10-5 
rs11231693 11 MACRODI- A 0.06 0.041 45x10 8 198072 0.068 2.7x10714 110,164 0.009 42x10! 88043 2.5x1075 
VEGFB 
rs4765219 12 CCDC92 C 067 0.028 1.6x10715 209,807 0.037 1.0x101* 116592 0.018 53x10 * 93,350 5.7x10°° 
rs8042543 15 KLF13 C 078 0.026 1.2x10-2 208,255 0.023 67x10-5 115,760 0.030 1.0x10°© 92629 3.6x107! 
rs8030605 15 RFX7 A 014 0.030 88x10-2 208374 0.031 1.0x10° 115,864 0.031 59x10° 92644 99x10"! 
rs1440372 15 SMAD6 C 071 0.024 1.1x10-7° 207,447 0.022 1.1x10°5 115,201 0027 14x10° 92380 5.2x10°1 
rs2925979 16 CMIP T O31 0018 12x10 207,828 0.032 3.4x10-%? 115431 -0.002 79x10! 92,531 1.2x10-6 
rs4646404 17 PEMT G 067 0.027 1.4x10774 198,196 0.034 5.3x10°!! 115337 0.017 2.5x10°3 87,857 2.6x10-? 
rs8066985 17 KCNU2 A 050 0018 14x10? 209,977 0.026 4.0x10-° 16,683 0.007. 1.9x10°! 93428 1.8x103 
rs12454712 18 BCL2 T 061 0016 10x10-* 169,793 0.035 1.1x10-2 96182-0007 25x10! 73,576 1.6x107” 
rs12608504 19 JUND A 036 0022 88x10-?° 209990 0.017 26x10 %* 116689 0028 11x10? 93435 1.2x1071 
rs4081724 19 CEBPA G 085 0.035 7.4x10-12 207,418 0.033 9.2x10°? 115,322 0.039 14x10? 92,230 5.0x1071 
rs979012. 20 BMP2 T 034 0027 3.3x10-%4 209941 0.026 10x10’ 116668 0.028 66x10 8 93,407 67x10} 
rs224333 20 GDF5 G 062 0.020 26x10 208025 0.009 7.4x10- 15,803 0.036 9.0x10-7? 92,356 6.4x1075 
rs6090583 20 EYA2 A 048 0.022 6.2x10-%4 209435 0.029 28x10 '° 116382 0.015 24x10 93187 32x10? 
Novel loci achieving genome-wide significance in all-ancestry meta-analyses 
rs1534696 7 SNX10 C 043 0.011 41.3x10°% 212,501 0.027 2.1x10-® 18,187 -—0.006 2.6x10°! 92,243 2.1x10-° 
Previously reported loci achieving genome-wide significance in European-ancestry meta-analyses 
rs2645294 1  TBX15- T 058 0.031 1.7x10-?% 209,808 0.035 15x10 '* 116596 0.027 15x10’ 93,346 2.0x1071 
WARS2 
rs714515. = 1 ~~ =DNM3- G 043 0.027 4.4x10-45 203401 0.029 1.8x10°'° 113939 0.025 85x10? 89,596 51x10} 
PIGC 
rs2820443. 1 #LYPLALI T 0.72 0.035 53x10! 209,975 0.062 5.7x10735 116,672 0.002 69x10! 93,437 2.6x10717 
rs10195252 2 GRB14- T 059 0027 59x10°'!5 209,395 0.052 4.7x10~°° 116,329 -0.003 53x10! 93,199 2.4x10717 
COBLL1 
rs17819328 3  PPARG G 043 0.021 24x10°° 208809 0.035 4.6x10-%4 116,072 0.005 33x10! 92871 5.1x10- 
rs2276824 3 PBRMI1t C 043 0.024 3.2x10744 208,901 0.028 3.7x10°° 116,128 0.020 14x10* 92,907 2.0x107! 
rs2371767 3  ADAMTS9 G 0.72 0.036 1.6x107° 194,506 0.056 1.2x10776 108624 0.012 35x10 7% 86016 3.6x10-? 
rs1045241 5 TNFAIP8- C 0.71 0.019 44x10°’ 209,710 0.035 6.6x10712 116,560 -0.001 93x10! 93,284 8.3x1077 
HSD17B4 
rs7705502 5 CPEB4 A 033 0027 4.7x10-%4 209,827 0.027 19x10°% 116609 0.027 23x10’ 93,352 >0.99 
rs1294410 6 LY86 C 063 0.031 2.0x1077® 209,830 0.037 1.6x10°'5 116624 0.025 14x10°° 93340 63x10? 
rs1358980 6 VEGFA T 047 0039 3.1x10°?? 206862 0.060 3.7x10~34 115,047 0.015 40x10? 1,949 3.7x10714 
rs1936805 6 RSPO3 T 051 0043 3.6x10-35 209,859 0.052 3.7x10 °° 116602 0.031 3.1x10°'° 93,392 1.0x1073 
rs10245353 7 NFE2L3 A 020 0.035 84x10-7® 210,008 0.041 79x10 1? 116,704 0.027 14x10° 93,438 7.2x10-? 
rs10842707 12 ITPR2- T 023 0032 4.4x10-7® 210023 0.041 61x10 15 116,704 0022 14x10* 93,453 11x10? 
SSPN 
rs1443512 12 HOXC13 A 0.24 0.028 69x10 1% 209,980 0.040 1.1x107* 116688 0.013 28x10% 93,425 1.6x10-4 
rs2294239 22 ZNRF3 A 059 0.025 7.2x10-*3 209,454 0.028 69x10-'° 116414 0.024 23x10°© 93,173 5.0x107! 
Loci achieving genome-wide significance (P< 5 x 10~®) in sex-combined and/or sex-specific meta-analyses. P values and f coefficients for the association with WHRadjBMI in the meta-analyses of combined 
GWAS and Metabochip studies. The smallest P value for each SNP is shown in bold. Chr, chromosome; EAF, effect allele frequency. 
*The effect allele (EA) is the WHRadjBMl-increasing allele in the sex-combined analysis. 


+ Test for sex difference; values significant at the table-wise Bonferroni threshold of 0.05/49 = 1.02 x 10 3 are marked in bold. 
tLocus previously named N/ISCH-STAB1. Additional analyses that showed no significant evidence of heterogeneity between studies or due to ascertainment are provided in Supplementary Tables 27 and 28 


(Supplementary Note). 


(rs3936510) or only in men (1s459193; Extended Data Table 1, Sup- 
plementary Table 4). Other signals are more complex. The TBX15- 
WARS2 locus showed different but correlated lead SNPs in men and 
women near WARS2 (7 = 0.43), an independent signal near TBX15, 
and a distant independent signal near SPAGI7 (Fig. 1). At the HOXC 
gene cluster, conditional analyses identified independent (1 < 0.01) 
SNPs ~80 kb apart near HOXC12-HOXC13-HOTAIR and near HOXC4- 
HOXC6 (Fig. 1). These results suggest that association signals mapping 
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to the same locus might act on different underlying genes and may not 
be relevant to the same sex. 

We assessed the aggregate effects of the primary association signals 
at the 49 WHRadjBMI loci by calculating sex-combined and sex-specific 
risk based on genotypes of the lead SNPs. In a linear regression model, 
the risk scores were associated with WHRadjBML, with a stronger effect 
in women than in men (overall effect per allele 6 = 0.001, P = 6.7 X 10-4, 
women f = 0.002, P=1.0X 107'', men B = 7.0 X 1074, P= 0.02; 
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Figure 1 | Regional SNP association plots 
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Extended Data Fig. 3 and Supplementary Note). The 49 SNPs explained 
1.4% of the variance in WHRadjBMI overall, and more in women (2.4%) 
than in men (0.8%) (Supplementary Table 6). Compared to the 16 
previously reported loci”®, the new loci almost doubled the explained 
variance in women and tripled that in men. We further estimated that 
the sex-combined variance explained by all HapMap SNPs'5 (h’<) is 
12.1% (s.e.m. = 2.9%). 

At 17 loci with high-density coverage on the Metabochip”, we used 
association summary statistics to define credible sets of SNPs with a high 
probability of containing a likely functional variant. The 99% credible 
sets at seven loci spanned <20 kb, and at HOXC13 included only a single 
noncoding SNP (Supplementary Table 7 and Supplementary Fig. 5). 
Imputation up to higher density reference panels will provide greater 
coverage and may have more potential to localize functional variants. 


WHRadjBMI variants and other traits 


Given the epidemiological correlations between central obesity and 
other anthropometric and cardiometabolic measures and diseases, we 
evaluated lead WHRadjBMI variants in association data from GWAS 
consortia for 22 traits. In total, 17 of the 49 variants were associated 
(P<5 X10 *) with at least one of the traits: high-density lipoprotein 
cholesterol (HDL; n = 7 SNPs), triglycerides (n = 5), low-density lipo- 
protein cholesterol (LDL; n = 2), adiponectin adjusted for BMI (n = 3), 


fasting insulin adjusted for BMI (n = 2), T2D (n = 1), and height (n = 7) 
(Supplementary Tables 8 and 9). WHRadjBMI SNPs also showed 
enrichment for directional consistency among nominally significant 
(P < 0.05) associations with these traits and also with fasting and 2-h 
glucose, diastolic and systolic blood pressure, BMI and coronary artery 
disease (CAD) (Ppinomial < 0.05/23 = 0.0022; Extended Data Table 2); 
these results were generally supported by meta-regression analysis of 
the regression coefficient estimates (Supplementary Table 10). Further- 
more, our WHRadjBMI loci overlap with associations reported in the 
National Human Genome Research Institute (NHGRI) GWAS cata- 
logue (Table 2 and Supplementary Table 11)", the strongest of which is 
the locus near LEKR1, which is associated (P = 2.0 X 10 *°) with birth 
weight'’. Unsupervised hierarchical clustering of the corresponding 
matrix of association Z-scores showed three major clusters character- 
ized by patterns of anthropometric and metabolic traits (Extended 
Data Fig. 4). These data extend knowledge about genetic links between 
WHRadjBMlI and insulin-resistance-related traits; whether this reflects 
underlying causal relations between WHRadjBMI and these traits, or 
pleiotropic loci, cannot be inferred from our data. 


Potential functional WHRadjBMI variants 


We next examined variants in LD with the WHRadjBMI lead SNPs 
(1° > 0.7) for predicted effects on protein sequence, copy number, and 
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Table 2 | Candidate genes at new WHRadjBMI loci 


SNP Locus eQTL GRAIL DEPICT Literatures Other GWAS signals|| 
(P-<10~5)* (P<0.05)+ (FDR <0.05) 

rs905938 DCST2 ZBTB7B (PB, blood) - - - - 

rs10919388 GORAB = - = = - 

rs1385167 MEIS1 = - “ MEIS1 = 

rs1569135 CALCRL = TFPI - CALCRL = 

rs10804591 PLXND1 - - PLXND1 7 

rs17451107 LEKR1 TIPARP (S,0), LEKR1 (S) ~ = Birth weight: CCNL1, LEKR1 

rs3805389 NMU = = - NMU - 

rs9991328 FAM13A FAM13A (S) FAM13A = Fl: FAM13A 

rs303084 SPATA5-FGF2 = FGF2 - FGF2, NUDT6, SPRY1 “ 

rs9687846 MAP3K1 7 MAP3K1 - MAP3K1 FI, TG: ANKRD55, MAP3K1 

rs6556301 FGFR4 7 MxXD3 . FGFR4 Height 

rs7759742 BTNL2 HLA-DRA (S), KLHL31 (S) = (not analysed) = = 

rs1776897 HMGA1 7 7 (not analysed) HMGA1 Height: HMGA1, C6orf106,LBH 

rs1534696 SNX10 SNX10 (S), CBX3 (S) - = SNX10 - 

rs7801581 HOXA11 - HOXA11 HOXA1 1 HOXA11 = 

rs7830933 NKX2-6 STC1 (S) - = NKxX2-6, STC1 - 

rs12679556 MSC = EYA1 RP11-1102P16.1 MSC,EYA1 - 

rs10991437 ABCA1 = ~ ABCA1 - 

rs7917772 SFXN2 e - = SFXN2 Height 

rs11231693 MACROD1-VEGFB - VEGFB MACROD 1 MACROD 1, VEGFB - 

rs4765219 CCDC92 CCDC292 (S, O, L), FAM101A : - Adiponectin, Fl, HDL, TG: 
ZNF664 (S, O) CCDC92, ZNF664 

rs8042543 KLF13 - KLF13 - KLF13 - 

rs8030605 RFX7 = = - - 

rs1440372 SMAD6 SMAD6 (blood) SMAD6 SMAD6 SMAD6 Height 

1$2925979 CMIP CMIP (S) - - CMIP, PLCG2 Adiponectin, Fl, HDL: CMI/P 

rs4646404 PEMT 7 - PEMT PEMT - 

rs8066985 KCNJ2 - - - KCNJ2 - 

rs12454712 BCL2 = - - BCL2 - 

rs12608504 JUND KIAA1683 (PB, O), JUND JUND a JUND - 
(LCL) 

rs4081724 CEBPA - CEBPA - CEBPA, CEBPG - 

rs979012 BMP2 7 BMP2 BMP2 BMP2 Height: BMP2 

rs224333 GDF5 CEP250 (S, 0), VOCC GDF5 GDF5 GDF5 Height: GDF5, UQCC 
(blood, S, O, L, LCL) 

rs6090583 EYA2 = EYA2 EYA2 EYA2 - 


Candidate genes based on secondary analyses or literature review. Details are provided in Supplementary Tables 8, 9, 11-13, 15, 19,21 and Supplementary Note. The only non-synonymous variant in high LD with 
an index SNP was GDF5 S276A. No copy number variants were identified. PB, peripheral blood mononuclear cells; Fl, fasting insulin adjusted for BMI; HDL, high-density lipoprotein cholesterol; L, liver; LCL, 


lymphoblastoid cell line; O, omental adipose; S, subcutaneous adipose; TG, triglycerides. 
* Gene transcript levels associated with the SNP in the indicated tissue(s). 

+ Genes in pathways identified as enriched by GRAIL analysis. 

+ Significant (FDR < 5%) pathway genes derived by DEPICT using GWAS-only results. 

§ Most plausible candidate genes based on literature review. 


|| Traits associated at P<5 x 10 ® in GWAS or the GWAS catalogue using the index SNP or a proxy, and the genes(s) named. 


cis-regulatory effects on expression (Table 2, Supplementary Tables 12-15 
and Supplementary Note). At 11 of the new loci, lead WHRadjBMI SNPs 
were in LD with cis-expression quantitative trait loci (eQTLs) for tran- 
scripts in subcutaneous adipose tissue, omental adipose tissue, liver or 
blood cell types (Table 2 and Supplementary Table 15). No additional 
sex-specific eQTLs were identified, perhaps reflecting limited power 
(Supplementary Table 16). 

At the 11 WHRadjBMI loci containing eQTLs, we compared the 
location of the candidate variants to regions of open chromatin (DNase 
hypersensitivity and formaldehyde-assisted isolation of regulatory ele- 
ments (FAIRE)) and histone modification enrichment (histone 3 Lys 4 
methylation (H3K4mel1), H3K4me2, H3K4me3, histone 3 Lys 27 acety- 
lation (H3K27ac), and H3K9ac) in adipose, liver, skeletal muscle, bone, 
brain, blood and pancreatic islet tissues or cell lines (Supplementary 
Table 17). At7 of these 11 loci, at least one variant was located in a puta- 
tive regulatory element in two or more data sets from the same tissue as 
the eQTL, suggesting that these elements may influence transcriptional 
activity (Supplementary Table 18). For example, at LEKR1, five variants 
in LD with the WHRadjBMI lead SNP are located in a 1.1-kb region 
with evidence of enhancer activity (H3K4mel1 and H3K27ac) in adi- 
pose tissue (Extended Data Fig. 5a). 

Wealso examined whether any variants overlapped with open chro- 
matin or histone modifications from only one of the tested tissues, 
possibly reflecting tissue-specific regulatory elements (Supplementary 
Table 18). For example, five variants in a 2.2-kb region, located 77 kb 
upstream from a CALCRL transcription start site, overlapped with peaks 
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in at least five data sets in endothelial cells (Extended Data Fig. 5b), 
suggesting that one or more of these variants may influence transcrip- 
tional activity. CALCRL, which is expressed in endothelial cells, is required 
for lipid absorption in the small intestine, and influences body weight 
in mice’*. Other variants located in tissue-specific regulatory elements 
were detected at NMU for endothelial cells, at KLF13 and MEIS1 for 
liver, and at GORAB and MSC for bone (Supplementary Table 18). 


Biological mechanisms 


To identify potential functional connections between genes mapping 
to the 49 WHRadjBMI loci, we used three approaches (Supplemen- 
tary Note). A survey of literature using GRAIL” identified 15 genes with 
nominal significance (P< 0.05) for potential functional connectivity 
(Table 2 and Supplementary Table 19). The predefined gene set rela- 
tionships across loci identified using MAGENTA” highlighted signal- 
ling pathways involving vascular endothelial growth factor (VEGF), 
phosphatase and tensin (PTEN) homologue, the insulin receptor, and 
peroxisome proliferator-activated receptors (Supplementary Table 20). 
VEGF signalling plays a central, complex role in angiogenesis, insulin 
resistance and obesity~’, and PTEN signalling promotes insulin resist- 
ance”, Analyses using Data-driven Expression Prioritized Integration 
for Complex Traits (DEPICT) facilitated prioritization of genes at asso- 
ciated loci, analyses of tissue specificity, and enrichment of reconsti- 
tuted gene sets through integration of association results with expression 
data, protein-protein interactions, phenotypic data from gene knockout 
studies in mice, and predefined gene sets. DEPICT identified at least one 
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prioritized gene (false discovery rate (FDR) < 5%) at nine loci (Table 2 
and Supplementary Table 21) and identified 234 reconstituted gene 
sets (161 after pruning of overlapping gene sets) enriched for genes at 
WHRadjBMI loci. Among these we highlight biologically plausible gene 
sets suggesting roles in body fat regulation (including adiponectin sig- 
nalling, insulin sensitivity and regulation of glucose levels), skeletal 
growth, transcriptional regulation and development (Fig. 2 and Sup- 
plementary Table 22). We also note gene sets that are specific for abun- 
dance or development of metabolically active tissues including adipose, 
heart, liver and muscle. Specific genes at the loci were significantly enriched 
(FDR < 5%) for expression in adipocyte-related tissues, including abdo- 
minal subcutaneous fat (Fig. 2 and Supplementary Table 23). Together, 
these analyses identified processes related to insulin and adipose bio- 
logy and highlight mesenchymal tissues, especially adipose tissue, as 
important to WHRadjBMI. 

We also tested variants at the 49 WHRadjBM1I loci for overlap with 
elements from 60 selected regulatory data sets from the ENCODE” 
and Epigenomic RoadMap” data and found evidence of enrichment 
in 12 data sets (P < 0.05/60 = 8.3 X 10 *; Extended Data Table 3). The 
strongest enrichments were detected for data sets typically attributed to 
enhancer activity (H3K4me1 and H3K27ac) in adipose, muscle, endo- 
thelial cells, and bone, suggesting that variants may regulate transcription 
in these tissues. These analyses point to mechanisms linking WHRadjBMI 
loci to regulation of gene expression in tissues highly relevant for adi- 
pocyte metabolism and insulin resistance. 

Wealso reviewed functions of candidate genes located near new and 
previously established WHRadjBM1I loci’®, identifying genes involved 
in adipogenesis, angiogenesis and transcriptional regulation (Table 2, 
literature review in the Supplementary Note). Adipogenesis candidate 
genes include CEBPA, PPARG, BMP2, HOXC-mir196, SPRY 1, TBX15, 
and PEMT. Of these, CEBPA and PPARG are essential for white adipose 
tissue differentiation”®, BMP2 induces differentiation of mesenchymal 
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stem cells towards adipogenesis or osteogenesis’, and HOXCS is a 
repressor of brown adipogenesis in mice that is regulated by miR-196a 
(ref. 28), also located within the HOXC region (Fig. 1). Angiogenesis 
genes may influence expansion and loss of adipose tissue”’; they include 
VEGFA, VEGFB, RSPO3, STAB1, WARS2, PLXND1, MEIS1, FGF2, 
SMAD6 and CALCRL. VEGFB is involved in endothelial targeting of 
lipids to peripheral tissues*°, and PLXND1 limits blood vessel branch- 
ing, antagonizes VEGF, and affects adipose inflammation**”’. Tran- 
scriptional regulators at WHRadjBMI loci include CEBPA, PPARG, 
MSC, SMAD6, HOXA, HOXC, ZBTB7B, JUND, KLF13, MEIS1, RFX7, 
NKX2-6 and HMGAI. Other candidate genes include NMU, FGFR4 and 
HMGAI, for which mice deficient for the corresponding genes exhibit 
obesity, glucose intolerance and/or insulin resistance’. 


Five additional central obesity traits 


To determine whether the WHRadjBMI variants exert their effects pri- 
marily through waist circumference or hip circumference and to iden- 
tify loci that are not reported for WHRadjBMI, BMI or height**’’, we 
performed association analyses for five additional traits: WCadjBMI, 
HIPadjBMI, WHR, waist circumference and hip circumference. On 
the basis of phenotypic data alone, waist circumference and hip circum- 
ference are highly correlated with BMI (r = 0.59-0.92), and WHR is 
highly correlated with WHRadjBMI (r = 0.82-0.95), while WCadjBMI 
and HIPadjBMI are moderately correlated with height (r = 0.24-0.63; 
Supplementary Table 24). In contrast to WHRadjBMI, which has almost 
no genetic correlation (see Methods) with height (rg < 0.04; Extended 
Data Fig. 2c), WCadjBMI (rg = 0.42) and HIPadjBMI (rg = 0.82) have 
moderate genetic correlations with height. These data suggest that some, 
but not all, WCadjBMI and HIPadjBM1I loci would be associated with 
height. 

Across all meta-analyses, we identified an additional 19 loci associ- 
ated with one of the five traits (P< 5 X 10 °), nine of which showed 
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Table 3 | New loci achieving genome-wide evidence of association (P <5 x 10~®) with additional waist and hip circumference traits 


Sex-combined Women Men Sex diff. 

SNP Trait = Chr Locus EA*  EAF p P N B P N B P N Py 
Loci achieving genome-wide significance in European-ancestry meta-analyses 
rs10925060 WCadjBM OR2W5- T 0.03 0.017 2.2x10°5 140,515 0.002 68x10! 85,186 0.045 9.1x10773 55,522 1.7x10-8 

NLRP3 
rs10929925 HIP 2  SOX11 C 0.55 0.020 4.5x10-® 207,648 0.021 9.0x10~° 5,428 0.018 3.2x10* 92499 61x10"! 
rs2124969 WCadjBMI 2 /TGB6 C 0.42 0.020 7.1x10-2 231,284 0.016 3.5x10°-* 127,437 0.025 2.3x10°? 104,039 1.4x107} 
rs17472426 WCadjBMI 5 CCNJL T 092 0.014 3.1x10°2 217,564 -0.014 1.0x107! 9,804 0.052 4.3x10-® 97,954 3.9x10-8 
rs7739232 HIPadjBMI 6 KLHL31 A 0.07 0.037 54x10-° 131,877 0.063 1.0x10-® 80,475 -0.004 7.5x107! 51,589 2.9x1075 
r$13241538 HIPadjBM| 7 KLF14 C 048 0.017 1.6x10-° 210,935 0.033 9.9x10~74 7,210 —0.003 5.0x107! 93,911 2.0x10-° 
rs7044106 HIPadjBMI| 9 C5 C 0.24 0.023 4.1x10°° 143,412 0.039 5.7x10-2 86,733 -0.003 69x10! 56865 1.3x10-5 
rs11607976 HIP 11 MYEOV C 0.70 0.022 4.2x10-® 212,815 0.019 1.9x10~* 8,391 0.024 7.7x10°§ 94,701 44x107} 
rs1784203 WCadjBMI 11 KIAA1731 A 0.01 0.031 1.3x1078 63,892 0.000 9.9x107! 35,539 0.075 1.0x10-2° 28,353 1.2x107! 
rs1394461 WHR 11 CNIN5 C 0.25 0.017 4.7x10°* 144,349 0.035 3.6x107® 987,441 -0.011 1.6x10°1 57,094 1.1x10-6 
rs319564 WHR 13. GPC6 C 045 0.014 34x10°° 212,137 0.003 5.3E-01 7,970 0.027 1.6x10-® 94350 6.0x10-5 
rs2047937 WCadjBMI 16 ZNF423  C 0.50 0.019 4.7x10-® 231,009 0.022 5.5x10-’ 127,288 0.014 36x107% 103,914 2.0x107! 
rs2034088 HIPadjBM!| 17 VPS53 T 053 0.021 48x10? 210,737 0.028 9.6x1071° 7,142 0.014 65x107% 93,781 25x10? 
rs1053593 HIPadjBMI 22 HMGXB4 T 0.65 0.021 3.9x10-® 202,070 0.029 1.8x10-° 4,347 0.011 51x10? 87,908 62x10~3 
Loci achieving genome-wide significance in all-ancestry meta-analyses 
rs1664789 WCadjBM| 5 ARLI15 C 041 0.014 2.6x10°° 244,110 0.005 2.8x107! 133,052 0.026 3.6x10-® 109,025 4.4x10-* 
rs722585 HIPadjBMI| 6 GMDS G 068 0.015 2.1x10°* 205,815 -0.001 88x10"! 3,965 0.032 9.2x10-° 89,831 4.3x10-° 
rs1144 WCadjBMI 7 SRPK2 C 0.34 0.019 3.1x10-® 239,342 0.020 1.2x10°° 131,398 0.018 4.1x107-* 105,911 7.8x107} 
182398893 WHR 9 PTPDC1 A 0.71 0.020 4.0x10-® 226572 0.019 5.1x10°° 124,577 0.019 2.7x10°* 99,968 9.5x107} 
rs4985155t HIP 16 PDXDC1 A 0.66 0.018 4.5x10°7 227,296 0.011 1.6x10°? 125,048 0.029 9.7x10-® 100,313 6.3x107° 
P values and f coefficients for the association with the trait indicated in the meta-analysis of combined GWAS and Metabochip studies. The smallest P value for each SNP is shown in bold. 
*The effect allele is the trait-increasing allele in the sex-combined analysis. 


+ Test for sex difference; values significant at the table-wise Bonferroni threshold of 0.05/19 = 2.63 x 103 are marked in bold. 
$P=7.3 xX 10° © with height in ref. 43 (index SNP rs1136001; r° = 0.79, distance = 2,515 base pairs (bp)). 


significantly larger effects (Puitterence < 0.05/19 = 0.003) in one sex than 
in the other (Table 3, Supplementary Figs 1-4 and Supplementary 
Table 25). Three of four new loci with larger effects in women were 
associated with HIPadjBMI and three of five new loci with larger effects 
in men were associated with WCadjBMI. Most of the 19 loci showed 
some evidence of association with WHRadjBMI in sex-combined or 
sex-specific analyses, but four loci showed no association (P > 0.01) with 
WHRadjBMI, BMI, or height (Supplementary Tables 8 and 26). 

We next asked whether the genes and pathways influencing these 
five traits are shared with WHRadjBMI orare distinct. Candidate genes 
were identified based on association with other traits, eQTLs, GRAIL and 
literature review (Extended Data Table 4 and Supplementary Tables 8, 
11-13, 15-16 and 19). Candidate variants identified based on LD 
(7° > 0.7) included coding variants in NTAN1 and HMGXB4, and six 
loci showed significant eQTLs in subcutaneous adipose tissue. On the 
basis of the literature, several candidate genes are involved in adi- 
pogenesis and insulin resistance. For example, delayed induction of 
preadipocyte transcription factor ZNF423 in fibroblasts results in 
delayed adipogenesis**, and NLRP3 is part of inflammasome and pro- 
inflammatory T-cell populations in adipose tissue that contribute to 
inflammation and insulin resistance’. GRAIL analyses identified con- 
nections that partially overlap with those identified for WHRadjBMI 
(Supplementary Table 19). Taken together, the additional loci appear 
to function in processes similar to the WHRadjBMI loci. The iden- 
tification of loci that are more strongly associated with WCadjBMI or 
HIPadjBMI than the other anthropometric traits suggests that the addi- 
tional traits characterize aspects of central obesity and fat distribution 
that are not captured by WHRadjBMI or BMI alone. 


Discussion 


These meta-analyses of GWAS and Metabochip data in up to 224,459 
individuals identified additional loci associated with waist and hip cir- 
cumference measures and help to determine the role of common genetic 
variation in body fat distribution that is distinct from BMI and height. 
Our results emphasize the strong sexual dimorphism in the genetic reg- 
ulation of fat distribution traits, a characteristic not observed for overall 
obesity as assessed by BMI”*. Differences in body fat distribution between 
the sexes emerge in childhood, become more apparent during puberty”, 
and change with menopause, generally attributed to the influence of sex 
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hormones*™. At loci with stronger effects in one sex than the other, these 
hormones may interact with transcription factors to regulate gene activity. 

Annotation of the loci emphasized the role for mesenchymally derived 
tissues, especially adipose tissue, in fat distribution and central obesity. 
The development and regulation of adipose tissue deposition is closely 
associated with angiogenesis”, a process highlighted by candidate genes 
at several WHRadjBMI loci. These tissues are implicated in insulin 
resistance, consistent with the enrichment of shared GWAS signals with 
lipids, T2D, and glycaemic traits. The identification of skeletal growth 
processes suggests that the underlying genes affect early development 
and/or differentiation of adipocytes from mesenchymal stem cells. By 
contrast, BMI has a substantial neuronal component, involving pro- 
cesses such as appetite regulation®®. Our results provide a foundation 
for future biological research in the regulation of body fat distribution 
and its connections with cardiometabolic traits, and offer potential target 
mechanisms for interventions in the risks associated with abdominal 
fat accumulation. 


Online Content Methods, along with any additional Extended Data display items 
and Source Data, are available in the online version of the paper; references unique 
to these sections appear only in the online paper. 
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METHODS 


Study overview. Our study included 224,459 individuals of European, east Asian, 
south Asian and African-American ancestry. The European ancestry arm included 
142,762 individuals from 57 cohorts genotyped with genome-wide SNP arrays and 
67,326 individuals from 44 cohorts genotyped with the Metabochip'' (Extended 
Data Fig. 1 and Supplementary Table 1). The non-European ancestry arm comprised 
~1,700 individuals from one cohort of east Asian ancestry, ~3,400 individuals 
from one cohort of south Asian ancestry, and ~9,200 individuals from six cohorts 
of African-American ancestry, all genotyped with the Metabochip. There was no 
overlap between individuals genotyped with genome-wide SNP arrays and Meta- 
bochip. For each study, local institutional committees approved study protocols 
and confirmed that informed consent was obtained. No statistical methods were 
used to predetermine sample size. 

Traits. Our primary trait was WHRadjBML, the ratio of waist and hip circumfer- 
ences adjusted for age, age-squared, study-specific covariates if necessary and BMI. 
For each cohort, residuals were calculated for men and women separately and then 
transformed by the inverse standard normal function. Cohorts with related men 
and women provided inverse standard normal transformed sex-combined residuals. 
For each cohort, the same transformations were applied to other traits: (1) WHR 
without adjustment for BMI (WHR); (2) waist circumference with (WCadjBMI) 
and without adjustment for BMI; and (3) hip circumference with (HIPadjBMI) 
and without adjustment for BMI. 

European ancestry meta-analysis for genome-wide SNP array data. Sample and 
SNP quality control were undertaken within each cohort** (Supplementary Table 3). 
The GWAS scaffold in each cohort was imputed up to CEU haplotypes from 
HapMap resulting in ~2.5 million SNPs. Each directly typed and imputed SNP 
passing quality control was tested for association with each trait under an additive 
model ina linear regression framework (Supplementary Table 3). SNP positions are 
reported based on NCBI Build 36. For each cohort, sex-specific association sum- 
mary statistics were corrected for residual population structure using the genomic 
control inflation factor* (median Agc = 1.01, range = 0.99-1.08). SNPs were removed 
before meta-analysis if they had a minor allele count =3, deviation from Hardy- 
Weinberg equilibrium exact P< 10° °, directly genotyped SNP call rate <95%, or 
low imputation quality (below 0.3 for MACH, 0.4 for IMPUTE, and 0.8 for PLINK). 
Association summary statistics for each trait were combined via inverse-variance 
weighted fixed-effects meta-analysis and corrected for a second round of genomic 
control to account for structure between cohorts (Extended Data Fig. 1 and Sup- 
plementary Fig. 1). 

European ancestry meta-analysis for Metabochip data. Sample and SNP quality 
control analyses were undertaken in each cohort (Supplementary Table 3). Each 
SNP passing quality control was tested for association with each trait under an 
additive model using linear regression. The Metabochip array'’ is enriched, by 
design, for loci associated with anthropometric and cardiometabolic traits, thus, we 
based our correction on 4,425 SNPs selected for inclusion based on associations 
with QT-interval that were not expected to be associated with anthropometric 
traits (>500 kb from variants on Metabochip* for these traits). These study-specific 
inflation factors had a median Agc = 1.01 (range 0.93-1.11), with only one study 
exceeding 1.10. After removing SNPs for quality control as described in the pre- 
vious section, association summaty statistics were combined via inverse-variance 
weighted fixed-effects meta-analysis and corrected for a second round of genomic 
control on the basis of QT-interval SNPs to account for structure between cohorts. 
European ancestry meta-analyses. Association summary statistics from the two 
parts of the European ancestry arm were combined via inverse-variance weighted 
fixed-effects meta-analysis using METAL” with no further genomic control cor- 
rection. Results were reported for SNPs with a sex-combined sample size =50,000. 
The meta-analyses were repeated for men and women separately for each trait. 
Analyses were corrected for population structure within each sex. The meta-analysis 
of WHRadjBMI in men included up to 93,480 individuals, and in women up to 
116,742 individuals. 

Meta-analyses of studies of all ancestries. Sample and SNP quality control, tests 
of association, genomic control correction (median Agc = 1.01, range = 0.90-1.17, 
with only one study exceeding 1.10), and meta-analyses were performed as described 
above. Association summary statistics from the European and non-European ances- 
try meta-analyses were combined via inverse-variance weighted fixed-effects meta- 
analysis without further genomic control correction. 

Heterogeneity. For each lead SNP, we tested for sex differences based on the sex- 
specific beta estimates and standard errors, while accounting for potential correla- 
tion between estimates as previously described’®. Similarly, we tested for potential 
differences in effects between European and non-European samples, comparing 
the effects from GWAS+ Metabochip data for Europeans and Metabochip data for 
non-Europeans, and we tested for differences between population-based studies and 
samples ascertained on diabetes status, and cardiovascular disease, or both. In assess- 
ing effects of ascertainment overall, we compared effects in seven subsets of our 
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study sample using population-based studies (that is, those not ascertained on any 
phenotype) as the referent population: (1) all studies ascertained on any pheno- 
type, (2) T2D cases, (3) T2D controls, (4) T2D cases plus controls, (5) CAD cases, 
(6) CAD controls and (7) CAD cases plus controls. We evaluated significance for 
heterogeneity tests within each comparison using a Bonferroni-corrected P value 
of 0.05/49 = 0.05/49 = 1.02 X 10 * as well as an FDR threshold** of <5% (Sup- 
plementary Table 28). Between-study heterogeneity in all meta-analyses was assessed 
using I’ statistics’. 

Heritability and genetic and phenotypic correlations of waist traits. We cal- 
culated the heritability and genetic correlations of several central obesity traits 
using variance component models”””! in the Framingham Heart Study (FHS) and 
TwinGene study. In this approach, the phenotypic variance is decomposed into 
additive genetic, non-additive genetic, and environmental sources of variation (includ- 
ing model error), and for sets of traits, the covariances between traits. We report 
narrow sense heritability (h?), the ratio of the additive genetic variance to the total 
phenotypic variance. Sex-specific inverse normal trait residuals, adjusted for age 
(and cohort in FHS), were used to estimate heritability separately in men and 
women, using variance components analysis in SOLARv.4.2.7 (FHS)? or Mx1.703 
(TwinGene)*. Additionally, the sex-specific residuals were used to conduct bivari- 
ate quantitative variance component genetic analyses that calculate genetic and 
environmental correlations between traits. The genetic correlations obtained are 
estimates of the additive effects of shared genes, and a genetic correlation signifi- 
cantly different from zero suggests a direct influence of the same genes on more than 
one trait. Similarly, significant environmental correlations suggest shared environ- 
mental effects. 

Weestimated sex-stratified correlations between all waist traits, as well as BMI, 

height, and weight in TwinGene, FHS, KORA and EGCUT. In TwinGene and FHS, 
age-adjusted Pearson correlations were used; in EGCUT and KORA, correlations 
were adjusted for age and age-squared. 
European ancestry approximate conditional analyses. To evaluate the evidence 
for multiple association signals within identified loci, we performed approximate 
conditional analyses of sex-combined, women-specific and men-specific data as 
implemented in the GCTA software'***. This approach makes use of association 
summary statistics from the combined European ancestry meta-analysis and a 
reference data set of individual-level genotype data to estimate LD between var- 
iants and hence also the approximate correlation between allelic effect estimates 
in a joint association model. 

To evaluate robustness of the GCTA results, we performed analyses using two 

reference data sets: Prospective Investigation of the Vasculature in Uppsala Seniors 
(PIVUS) consisting of 949 individuals from Uppsala County, Sweden, with both 
GWAS and Metabochip genotype data; and Atherosclerosis Risk in Communities 
(ARIC) consisting of 6,654 individuals of European descent from four communit- 
ies in the United States with GW AS data. Both GWAS data sets were imputed using 
data from phase II of the International HapMap Project”. Results shown use the 
PIVUS reference data set because Metabochip genotypes are available (see a com- 
parison in the Supplementary Note). Assuming that the LD correlations between 
SNPs more than 10 megabases (Mb) away are zero, and using each reference data set 
in turn, we performed a genome-wide stepwise selection procedure to select asso- 
ciated SNPs one-by-one at a P<5 X 10 ®. For each locus at which multiple asso- 
ciation signals were observed in the sex-combined, women-, and/or men-specific 
data, the SNPs selected by GCTA as independently associated with WHRadjBMI 
in any of the three meta-analyses are reported, with the SNP identified in the sex- 
combined analysis taken by default when proxies are identified in the women- and/ 
or men-specific analyses. For SNPs not selected by a particular joint conditional 
analysis, but identified by either of the other two analyses, summary statistics were 
calculated for association analysis of the SNP conditioned on the GCTA-selected 
SNP(s). 
Genetic risk score. We calculated a genetic risk score for each individual in the 
population-based KORA study (1,670 men and 1,750 women) using the 49 identified 
variants, weighted by the allelic effect from the European ancestry meta-analyses of 
WHRadjBMI. Sex-combined scores were computed on the basis of the sex-combined 
meta-analysis. Sex-stratified scores were calculated on the basis of men- and women- 
specific meta-analyses, in which SNPs not achieving nominal significance in the 
respective sex (P = 0.05) were excluded. For each individual, the sex-combined 
and sex-stratified risk scores were rounded to the nearest integer for plotting. Risk 
scores were then tested for association with WHRadjBMI using linear regression. 
Explained variance. We calculated the variance explained by a single SNP as: 


Bp’ 
2eMAF@(1 —MAP Ty 


in which MAF is the minor allele frequency, f is the SNP effect estimate computed 
by meta-analysis, and Var(Y) is the variance of the phenotype Y as it went into the 
study-specific association testing. To derive the total variance explained by a set of 
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independent SNPs, we computed the sum of single-SNP explained variances under 
the assumption of independent contributions. 

To estimate the polygenic variance explained by all HapMap SNPs, we used the 

all-SNP estimation approach implemented in GCTA and analysed individuals in 
the ARIC and TwinGene cohorts, including the first 20 principal components as 
fixed covariates. After removing one of each pair of individuals with estimated 
genetic relatedness >0.025, 11,898 unrelated individuals with WHRadjBMI were 
available. 
Fine-mapping analyses. We considered each identified locus, defined as 500 kb 
upstream and downstream of the lead SNP, and computed 99% credible intervals 
using a Bayesian approach. On the basis of association summary statistics from 
the European ancestry, non-European ancestry, or all ancestries sex-combined 
meta-analyses, we calculated an approximate Bayes’ factor®® in favour of associa- 
tion, given by: 


in which f; is the allelic effect of the jth SNP, with corresponding standard error 
oj, and R; = 0.04/ (67 + 0.04), which incorporates a N(0,0.27) prior for f;. This 
prior gives high probability to small effect sizes, and only small probability to large 
effect sizes. We then calculated the posterior probability that the jth SNP is causal 
by: 

ah 
"1 >, BE 


in which the summation in the denominator is over all SNPs passing quality 
control across the locus. We compared the meta-analysis results and credible sets 
of SNPs likely to contain the causal variant as described*’. Assuming a single 
causal variant at each locus, a 99% credible set of variants was then constructed by: 
(1) ranking all SNPs according to their Bayes’ factor; and (2) combining ranked 
SNPs until their cumulative posterior probability exceeded 0.99. For each locus, 
we calculated the number of SNPs contained within the 99% credible sets, and the 
length of the genomic interval covered by these SNPs. 

Comparison of loci across traits. To determine whether the identified loci were 
also associated with any of 22 cardio-metabolic traits, we obtained association data 
from meta-analysis consortia DIAGRAM (T2D)**, CARDIoGRAM-C4D (CAD)”*, 
ICBP (diastolic and systolic blood pressure), GIANT (BMI, height)***’, GLGC 
(HDL, LDL, and triglycerides)*', MAGIC (fasting glucose, fasting insulin, fasting 
insulin adjusted for BMI, and two-hour glucose)***, ADIPOGen (BMI-adjusted 
adiponectin), CKDgen (urine albumin-to-creatinine ratio (UACR), estimated glo- 
merular filtration rate (eGFR), and overall CKD)**’, ReproGen (age at menarche, 
age at menopause), and GEFOS (bone mineral density)”°; others provided asso- 
ciation data for IgA nephropathy”! (also K.K., M.C., R.P.L. and A.G.G., unpub- 
lished data) and for endometriosis (stage B cases only)”. Proxies (r° > 0.8 in CEU) 
were used when an index SNP was unavailable. 

Wealso searched the NHGRI GWAS catalogue for previous SNP-trait associations 

near our lead SNPs’’. We supplemented the catalogue with additional genome-wide 
significant SNP-trait associations from the literature'*’°’* *°. We used PLINK to 
identify SNPs within 500 kb of lead SNPs using 1000 Genomes Project pilot I 
genotype data and LD (1”) values from CEU*!*; for rs7759742, HapMap release 
22 CEU data*"*’ were used. All SNPs within the specified regions were compared 
with the NHGRI GWAS catalogue’’. 
Enrichment of concordant cross-trait associations and effects. To evaluate 
whether the alleles associated with increased WHRadjBMI at the 49 identified 
SNPs convey effects for any of the 22 cardiometabolic traits, we conducted meta- 
regression analyses of the beta-estimates on these metabolic outcomes from other 
consortia with the beta-estimates for WHRadjBMI in our data®. 

On the basis of the association data across traits, we generated a matrix of 
Z-scores by dividing the association betas for each of the 49 WHRadjBMI SNPs 
for each of 22 traits by their respective standard errors. The traits did not include 
WHRadjBMI or nephropathy in Chinese subjects, but did include HIPadjBMI 
and WCadjBML. Each Z-score was made positive if the original trait-increasing 
allele also increased the look-up trait and negative if not. Missing associations with 
were assigned a value of zero. We performed unsupervised hierarchical clustering 
of the Z score matrix in R using the default settings of the ‘heatplot’ function from 
the madeé4 library (version 1.20.0), agglomerating clusters using average linkage 
and Pearson correlation metric distance. The rows and columns of matrix values 
were each automatically scaled to range from 3 to —3. Confidence in the hierarch- 
ical clustering was assessed by bootstrap analysis (10,000 resamplings) using the R 
package ‘pvclust’*’. 


Identification of candidate functional variants. The 1000 Genomes CEU pilot 
data were queried for SNPs within 500 kb and in LD (r’ > 0.7, an arbitrary thresh- 
old) with any index SNP. All identified variants were then annotated based on RefSeq 
transcripts using Annovar to identify potential nonsynonymous variants near iden- 
tified association signals. The distance between each variant and the nearest tran- 
scription start site were calculated using gene annotations from GENCODE (v.12). 
To investigate whether SNPs in LD with index SNPs are also in LD with com- 
mon copy number variants (CNVs), we extracted waist trait association results for 
a list of SNP proxies that are in high LD (r° > 0.8, CEU) with CNVs in European 
populations as described previously’. Altogether 6,200 CNV-tagging SNPs were 
used, which are estimated collectively to capture >40% of CNVs >1 kb in size. 
eQTLs. We examined our lead SNPs in eQTL data sets from several sources (Sup- 
plementary Note) for cis effects significant at P< 10~°. We then checked if the 
trait-associated SNP also had the strongest association with the expression level of 
its corresponding transcript. If not, we identified a nearby SNP that had a stronger 
association with expression (peak transcript SNP) of that transcript. To check 
whether effects of the peak transcript SNP and waist trait-associated SNP over- 
lapped, we conducted conditional analyses to estimate associations between the 
waist-associated SNP and transcript level when the peak-transcript-associated 
SNP was also included in the model, and vice versa. If the association for the 
expression-associated SNP was not significant (P > 0.05) when conditioned on the 
waist-associated SNP, we concluded that the waist-associated SNP is likely to explain 
a substantial proportion of the variance in gene transcript levels in the region. 
For SNPs that passed these criteria in either women or men eQTL data sets from 
deCODE, we investigated sex heterogeneity in gene transcript levels for whole blood 
(312 men, 435 women) and subcutaneous adipose tissue (252 men, 351 women) 
based on the sex-specific beta estimates and standard errors, while accounting for 
potential correlation between the sex-specific associations’. 
Epigenomic regulatory element overlap with individual variants. We examined 
overlap of regulatory elements with the 68 trait-associated variants and variants in 
LD with them (7° > 0.7, 1000 Genomes phase 1 version 2 EUR*), totalling 1,547 
variants. We obtained regulatory element data sets from the ENCODE Consortium** 
and Roadmap Epigenomics Project” corresponding to eight tissues selected based 
ona current understanding of WHRadjBMI pathways. The 226 regulatory element 
data sets included experimentally identified regions of open chromatin (DNase- 
seq, FAIRE-seq), histone modification (H3K4me1, H3K27ac, H3K4me3, H3K9ac 
and H3K4mez2), and transcription factor binding (Supplementary Table 17). When 
available, we downloaded data processed during the ENCODE Integrative Analysis”. 
We processed Roadmap Epigenomics sequencing data with multiple biological 
replicates using MACS2 (ref. 86) and the same Irreproducible Discovery Rate 
pipeline used in the ENCODE Integrative Analysis. Roadmap Epigenomics data 
with only a single replicate was processed using MACS2 alone. 
Global enrichment of WHRadjBMI-associated loci in epigenomic data sets. 
We performed permutation-based tests in a subset of 60 open chromatin (DNase- 
seq) and histone modification (H3K27ac, H3K4mel, H3K4me3 and H3K9ac) data 
sets to identify global enrichment of the WHRadjBMI-associated loci. We matched 
the index SNP at each locus with 500 variants having no evidence of association 
(P>0.5, ~1.2 million total variants) with a similar distance to the nearest gene 
(+11,655 bp), number of variants in LD (+8 variants), and minor allele frequency. 
Using these pools, we created 10,000 sets of control variants for each of the 49 loci 
and identified variants in LD (77 > 0.7) and within 1 Mb. For each SNP set, we 
calculated the number of loci with at least one variant located in a regulatory region 
under the assumption that one regulatory variant is responsible for each associ- 
ation signal. We initially calculated an enrichment P value by finding the propor- 
tion of control sets for which as many or more loci overlap a regulatory element 
than the set of associated loci. For increased P value accuracy, we estimated the P 
value assuming a sum of binomial distributions to represent the number of index 
SNPs or their LD proxies that overlap a regulatory data set compared to the 500 
matched control sets. 
GRAIL. Gene Relationships Among Implicated Loci (GRAIL)”” is a text-mining 
algorithm that evaluates the degree of relatedness among genes within trait regions. 
Using PubMed abstracts, a subset of genes enriched for relatedness and a set of 
keywords that suggest putative pathways are identified. To avoid potential bias 
from papers investigating candidate genes stimulated by GWAS, we restricted our 
search to abstracts published before 2006. We tested for enrichment of connectiv- 
ity in the independent SNPs that were significant in our study at P< 10°. 
MAGENTA. To investigate whether pathways including predefined sets of genes 
were enriched in the lower part of the gene P value distribution for WHRadjBMI, 
we performed a pathway analysis using Magenta 2.4 (ref. 20) and SNPs present in 
both the Metabochip and GWAS meta-analyses. SNPs were assigned to a gene if 
within 110 kb upstream or 40 kb downstream of the transcript’s boundaries. The 
most significant SNP P value within this interval was adjusted for putative con- 
founders (gene size, number of SNPs in a gene, LD pattern) using stepwise linear 
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regression, creating a gene association score. If the same SNP was assigned to 
multiple genes, only the gene with the lowest gene score was kept. The HLA region 
was removed from further analyses owing to its high LD structure and gene density. 
Each gene was then assigned pathway terms using Gene Ontology (GO), PANTHER, 
Ingenuity and Kyoto Encyclopedia of Genes and Genomes (KEGG)*”*®. Finally, 
the genes were ranked based on their gene association score, and a modified gene- 
set enrichment analysis using MAGENTA was performed. This analysis tested for 
enrichment of gene association score ranks above a given rank cutoff (including 
5% of all genes) in a gene-set belonging to a predefined pathway term, compared 
to multiple, equally sized gene-sets that were randomly sampled from all genes in 
the genome. Around 10,000-1,000,000 gene-set permutations were performed. 

DEPICT. This method is described in detail elsewhere”***. In brief, DEPICT uses 
gene expression data derived from a panel of 77,840 expression arrays”', 5,984 molec- 
ular pathways (based on 169,810 high-confidence experimentally derived protein- 
protein interactions”), 2,473 phenotypic gene sets (based on 211,882 gene-phenotype 
pairs from the Mouse Genetics Initiative”’), 737 reactome pathways”, 184 KEGG 
pathways”’, and 5,083 GO terms’’. DEPICT uses the expression data to reconstitute 
the protein-protein interaction gene sets, mouse phenotype gene sets, reactome 
pathway gene sets, KEGG pathway gene sets, and GO term gene sets. To avoid 
biasing the identification of genes and pathways covered by SNPs on the Metabo- 
chip, analyses were restricted to GWAS cohort data and included 226 WHRadjBMI 
SNPs in 78 non-overlapping loci with sex-combined P< 10” °. We used DEPICT 
to map genes to associated WHRadjBMI loci, which then allowed us to (1) system- 
atically identify the most likely causal gene(s) in a given associated region, (2) iden- 
tify reconstituted gene sets that were enriched in genes from associated regions, and 
(3) identify tissue and cell type annotations in which genes from associated regions 
were highly expressed. Associated regions were defined by all genes residing within 
LD (7° > 0.5) distance of the WHRadjBMI-associated index SNPs. Overlapping 
regions were merged, and SNPs that mapped near to or within the HLA region 
were excluded. The 93 WHRadjBMI SNPs with P< 10° (clumping thresholds: 
HapMap release 27 CEU r° = 0.01, 500 kb) resulted in 78 non-overlapping regions. 
GWAS-+ Metabochip index SNPs were annotated with DEPICT-prioritized genes 
if the DEPICT (GWAS-only) SNP was located within 500 kb. To mark related 
gene sets, we first quantified significant gene sets’ pairwise overlap using a non- 
probabilistic version of the reconstituted gene sets and the Jaccard index measure. 
Groups of gene sets with mutual Jaccard indices >0.25 were subsequently referred 
to as meta gene sets and named by the most significant gene set in the group (Sup- 
plementary Table 18 and Fig. 2a). In Fig. 2a, b, gene sets with similarities between 
0.1 and 0.25 were connected by an edge that was scaled according to degree of 
similarity. The Cytoscape tool was used to construct parts of Fig. 2 (ref. 96). In 
Fig. 2c, we show the significance of all cell type annotations and annotations that 
were categorized as ‘tissues’ at the outermost level of the Medical Subject Heading 


ontology. 

44. Winkler, T. W. et al. Quality control and conduct of genome-wide association 

meta-analyses. Nature Protocols 9, 1192-1212 (2014). 

45. Devlin, B. & Roeder, K. Genomic control for association studies. Biometrics 55, 
997-1004 (1999). 

46. Buyske, S. et al. Evaluation of the metabochip genotyping array in African 
Americans and implications for fine mapping of GWAS-identified loci: the PAGE 
study. PLoS ONE 7, e35651 (2012). 

47. Willer, C. J., Li, Y. & Abecasis, G. R. METAL: fast and efficient meta-analysis of 
genomewide association scans. Bioinformatics 26, 2190-2191 (2010). 

Benjamini, Y. & Hochberg, Y. Controlling the false discovery rate: a practical and 

powerful approach to multiple testing. J. R. Stat. Soc. Series B Stat. Methodol. 57, 

289-300 (1995). 

49. Higgins, J.P. & Thompson, S. G. Quantifying heterogeneity in a meta-analysis. Stat 

Med. 21, 1539-1558 (2002). 

50. Neale, M. C., Cardon, L. R. & North Atlantic Treaty Organization. Methodology for 

Genetic Studies of Twins and Families (Kluwer Academic Publishers, 1992). 

51. Falconer, D. S. Introduction to Quantitative Genetics 3rd edn (Oliver and Boyd, 

1990). 

52. Almasy, L. & Blangero, J. Multipoint quantitative-trait linkage analysis in general 

pedigrees. Am. J. Hum. Genet 62, 1198-1211 (1998). 

53. Neale, M. C. MX: Statistical Modeling 4th edn (Department of Psychiatry, 1997). 

54. Yang, J., Lee, S. H., Goddard, M. E. & Visscher, P. M. GCTA: a tool for genome-wide 

complex trait analysis. Am. J. Hum. Genet. 88, 76-82 (2011). 

55. Frazer, K. A. et al. Asecond generation human haplotype map of over 3.1 million 
SNPs. Nature 449, 851-861 (2007). 

56. Wakefield, J. A Bayesian measure of the probability of false discovery in genetic 
epidemiology studies. Am. J. Hum. Genet. 81, 208-227 (2007). 

57. Wellcome Trust Case Control Consortium. Bayesian refinement of association 
signals for 14 loci in 3 common diseases. Nature Genet. 44, 1294-1301 (2012). 

58. Morris,A.P. etal. Large-scale association analysis provides insights into the genetic 
architecture and pathophysiology of type 2 diabetes. Nature Genet. 44, 981-990 
(2012). 


48. 


59. 


60. 


61. 
62. 


63. 


64. 


65. 


66. 
67. 
68. 
69. 


70. 


71. 


72. 
73. 


74. 
75. 


76. 
77. 


78. 


79. 


80. 


81. 


82. 


83. 
84. 


85. 


86. 


87. 


88. 
89. 


90. 


91. 


92. 


93. 


94. 
95. 


96. 


ARTICLE 


Deloukas, P. et al. Large-scale association analysis identifies new risk loci for 
coronary artery disease. Nature Genet. 45, 25-33 (2013). 

Ehret, G. B. et al. Genetic variants in novel pathways influence blood pressure and 
cardiovascular disease risk. Nature 478, 103-109 (2011). 

Global Lipids Genetics Consortium. Discovery and refinement of loci associated 
with lipid levels. Nature Genet 45, 1274-1283 (2013). 

Scott, R. A. et al. Large-scale association analyses identify new loci influencing 
glycemic traits and provide insight into the underlying biological pathways. Nature 
Genet 44, 991-1005 (2012). 

Manning, A. K. et al. A genome-wide approach accounting for body mass index 
identifies genetic variants influencing fasting glycemic traits and insulin 
resistance. Nature Genet. 44, 659-669 (2012). 

Saxena, R. et al. Genetic variation in G/PR influences the glucose and insulin 
responses to an oral glucose challenge. Nature Genet 42, 142-148 (2010). 
Dastani, Z. et al. Novel loci for adiponectin levels and their influence on type 2 
diabetes and metabolic traits: a multi-ethnic meta-analysis of 45,891 individuals. 
PLoS Genet. 8, €1002607 (2012). 

Pattaro, C. et al. Genome-wide association and functional follow-up reveals new 
oci for kidney function. PLoS Genet. 8, €1002584 (2012). 

Boger, C. A. et al. CUBN is a gene locus for albuminuria. J. Am. Soc. Nephrol. 22, 
555-570 (2011). 

Stolk, L. etal. Meta-analyses identify 13 loci associated with age at menopause and 
highlight DNA repair and immune pathways. Nature Genet. 44, 260-268 (2012). 
Elks, C. E. et a/. Thirty new loci for age at menarche identified by a meta-analysis of 
genome-wide association studies. Nature Genet. 42, 1077-1085 (2010). 
Estrada, K. et al. Genome-wide meta-analysis identifies 56 bone mineral density 
oci and reveals 14 loci associated with risk of fracture. Nature Genet. 44, 491-501 
(2012). 

Gharavi, A. G. et al. Genome-wide association study identifies susceptibility loci for 
gA nephropathy. Nature Genet. 43, 321-327 (2011). 

Painter, J. N. et a. Genome-wide association study identifies a locus at 7p 15.2 
associated with endometriosis. Nature Genet. 43, 51-54 (2011). 

Hindorff, L.A. et a/. Potential etiologic and functional implications of genome-wide 
association loci for human diseases and traits. Proc. Natl Acad. Sci. USA 106, 
9362-9367 (2009). 

Kamatani, Y. et a/. Genome-wide association study of hematological and 
biochemical traits in a Japanese population. Nature Genet 42, 210-215 (2010). 
Franke, A. et al. Genome-wide meta-analysis increases to 71 the number of 
confirmed Crohn’s disease susceptibility loci. Nature Genet. 42, 1118-1125 
(2010). 

Sawcer, S. et al. Genetic risk and a primary role for cell-mediated immune 
mechanisms in multiple sclerosis. Nature 476, 214-219 (2011). 

Wang, K. S., Liu, X. F. & Aragam, N. A genome-wide meta-analysis identifies novel 
loci associated with schizophrenia and bipolar disorder. Schizophr. Res. 124, 
192-199 (2010). 

Cirulli, E. T. et al. Common genetic variation and performance on standardized 
cognitive tests. Eur. J. Hum. Genet. 18, 815-820 (2010). 

Gieger, C. et al. New gene functions in megakaryopoiesis and platelet formation. 
Nature 480, 201-208 (2011). 

Need, A. C. et al. A genome-wide study of common SNPs and CNVs in cognitive 
performance in the CANTAB. Hum. Mol. Genet. 18, 4650-4661 (2009). 

Purcell, S. et al. PLINK: a tool set for whole-genome association and population- 
based linkage analyses. Am. J. Hum. Genet. 81, 559-575 (2007). 

The 1000 Genomes Project Consortium. A map of human genome variation from 
population-scale sequencing. Nature 467, 1061-1073 (2010). 

The International HapMap Project. Nature 426, 789-796 (2003). 

Suzuki, R. & Shimodaira, H. Pvclust: an R package for assessing the uncertainty in 
hierarchical clustering. Bioinformatics 22, 1540-1542 (2006). 

1000 Genomes Project Consortium. An integrated map of genetic variation from 
1,092 human genomes. Nature 491, 56-65 (2012). 

Feng, J., Liu, T., Qin, B., Zhang, Y. & Liu, X. S. Identifying ChIP-seq enrichment using 
MACS. Nature Protocols 7, 1728-1740 (2012). 

Ashburner, M. et a/. Gene ontology: tool for the unification of biology. The Gene 
Ontology Consortium. Nature Genet. 25, 25-29 (2000). 

Mi, H. & Thomas, P. PANTHER pathway: an ontology-based pathway database 
coupled with data analysis tools. Methods Mol. Biol. 563, 123-140 (2009). 
Jimenez-Marin, A., Collado-Romero, M., Ramirez-Boo, M., Arce, C. & Garrido, J. J. 
Biological pathway analysis by ArrayUnlock and Ingenuity Pathway Analysis. BMC 
Proc. 3, S6 (2009). 

Kanehisa, M. & Goto, S. KEGG: Kyoto encyclopedia of genes and genomes. Nucleic 
Acids Res. 28, 27-30 (2000). 

Fehrmann, R. S. et a/. Gene expression analysis identified global gene dosage 
sensitivity in cancer. Nature Genet. 47, 115-125 (2015). 

Lage, K. et al. A human phenome-interactome network of protein complexes 
implicated in genetic disorders. Nature Biotechnol. 25, 309-316 (2007). 

Bult, C. J. et al. Mouse genome informatics in a new age of biological inquiry. 
IEEE Int. Symposium Bio-Informatics Biomedical Engineering 29-32 (2000). 

Croft, D. et al. Reactome: a database of reactions, pathways and biological 
processes. Nucleic Acids Res. 39, D691—D697 (2011). 

Kanehisa, M., Goto, S., Sato, Y., Furumichi, M. & Tanabe, M. KEGG for integration 
and interpretation of large-scale molecular data sets. Nucleic Acids Res. 40, 
D109-D114 (2012). 

Saito, R. et al. A travel guide to Cytoscape plugins. Nature Methods 9, 1069-1076 
(2012). 


©2015 Macmillan Publishers Limited. All rights reserved 


ARTICLE 


GWAS Data 
57 Cohorts 
142,762 Subjects 
2,507,022 SNPs 


ee eee 
ss os os ss 


‘di a a ae a ae 
Metabochip Data H 
44 Cohorts 
! 67,326 Subjects : 
{ 124,196 SNPs ! 

4 


Joint GWAS+MC 
Meta-Analysis 


210,088 Subjects 
2,542,447 SNPs (union) 
93,057 SNPs (intersection) 


Extended Data Figure 1 | Overall WHRadjBMI meta-analysis study design. 
Data (dashed lines) and analyses (solid lines) related to the GWAS cohorts for 
WHRadjBMI are coloured red and those related to the Metabochip (MC) 
cohorts are coloured blue. The two genomic control (Agc) corrections 
(within-study and among-studies) performed on associations from each data 
set are represented by grey-outlined circles. The Agc corrections for the GWAS 
meta-analysis were based on all SNPs and the Agc corrections for the 
Metabochip meta-analysis were based on a null set of 4,319 SNPs previously 
associated with QT interval. The joint meta-analysis of the GWAS and 


Metabochip data sets is coloured purple. All SNP counts reflect a sample size 
filter of n = 50,000 subjects. Additional WHRadjBMI meta-analyses included 
Metabochip data from up to 14,371 subjects of east Asian, south Asian or 
African-American ancestry from eight cohorts. Counts for the meta-analyses of 
waist circumference, hip circumference, and their BMI-adjusted counterparts 
(WCadjBMI and HIPadjBMI) differ from those of WHRadjBMI because 
some cohorts only had phenotype data available for one type of body 
circumference measurement (see Supplementary Table 2). 
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Extended Data Figure 2 | Women- and men-specific effects, phenotypic 
variances and genetic correlations. a, Figure showing effect beta estimates for 
the 20 WHRadjBMI SNPs showing significant evidence of sexual dimorphism. 
Sex-specific effect betas and 95% confidence intervals for SNPs associated 
with WHRadjBMI are shown as red circles and blue squares for women and 
men, respectively. Sample sizes, comprising more than 73,576 men and 96,182 
women, are listed in Table 1. The SNPs are classified into three categories: 
(1) those showing a women-specific effect (women SSE’), namely a significant 
effect in women and no effect in men (Pyomen <5 X 10°, Pen = 0.05), 

(2) those showing a pronounced women effect (‘women CED’), namely a 
significant effect in women and a less significant but directionally consistent 
effect in men (Pyomen <5 X 10°, 5 X 10° ® < Puen < 0.05); and (3) those 
showing a men-specific effect (‘men SSE’), namely a significant effect in men 
and no effect in women (Pyren <5 X 107°, Pyomen = 0.05). Within each of the 


three categories, the loci were sorted by increasing P value of sex-based 
heterogeneity in the effect betas. b, Figure showing standardized sex-specific 
phenotypic variance components for six waist-related traits. Values are shown 
in men (M) and women (W) from the Swedish Twin Registry (n = 11,875). 
The ACE models are decomposed into additive genetic components (A) shown 
in black, common environmental components (C) in grey, and non-shared 
environmental components (E) in white. Components are shown for waist 
circumference (WC), hip circumference (HIP), WHR, WCadjBMI, 
HIPadjBMI and WHRadjBMI. When the ‘A’ component is different in men 
and women with P < 0.05 for a given trait, its name is marked with an asterisk. 
c, Genetic correlations of waist-related traits with height, adjusted for age 
and BMI. Genetic correlations of three traits with height were based on variance 
component models in the Framingham Heart Study and TwinGene study 
(see Methods). 
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Extended Data Figure 3 | Cumulative genetic risk scores for WHRadjBMI 


a Overall 
applied to the KORA study cohort. a, All subjects (n = 3,440, 
700 Prrena = 6.7 X 10“). b, Only women (n = 1,750, Ptrena = 1.0 X 10° |'). ¢, Only 
men (n = 1,690, Ptrena = 0.02). Each genetic risk score illustrates the joint effect 
600 of the WHRadjBMI-increasing alleles of the 49 identified variants from Table 1 
s weighted by the relative effect sizes from the applicable sex-combined or 
500 = sex-specific meta-analysis. The mean WHRadjBMI residual and 95% 
© 2 confidence interval is plotted for each genetic risk score category (red dots). The 
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Extended Data Figure 4 | Heat map of unsupervised hierarchical clustering 
of the effects of 49 WHRadjBMI SNPs on 22 anthropometric and 
metabolic traits and diseases. The matrix of Z-scores representing the set of 
associations was scaled by row (locus name) and by column (trait) to range 
from —3 to 3. Negative values (blue) indicate that the WHRadjBMI-increasing 
allele was associated with decreased values of the trait and positive values (red) 
indicate that this allele was associated with increased values of the trait. 
Sample sizes for the associations are listed in Supplementary Table 8. 
Dendrograms indicating the clustering relationships are shown to the left and 
above the heat map. The WHRadjBMI-increasing alleles at the 49 lead SNPs 
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segregate into three major clusters comprised of alleles that associate with: 
(1) larger WCadjBMI and smaller HIPadjBMI (n = 30 SNPs); (2) taller stature 
and larger WCadjBMI (n = 8 SNPs); and (3) shorter stature and smaller 
HIPadjBMI (n = 11 SNPs). The three visually identified SNP clusters could be 
statistically distinguished with >90% confidence. Alleles of the first cluster 
were predominantly associated with lower high density lipoprotein (HDL) 
cholesterol and with higher triglycerides and fasting insulin adjusted for BMI 
(FladjBMI). BMD, bone mineral density; eGFRcrea, estimated glomerular 
filtration rate based on creatinine; LDL cholesterol, low-density lipoprotein 
cholesterol; UACR, urine albumin-to-creatinine ratio. 
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Extended Data Figure 5 | Regulatory element overlap with WHRadjBMI- 
associated loci. a, Five variants associated with WHRadjBMI and located 
~77 kb upstream of the first CALCRL transcription start site overlap regions 
with genomic evidence of regulatory activity in endothelial cells. b, Five 


WHRadjBMI variants, including rs8817452, in a 1.1-kb region (box) ~250 kb 


downstream of the first LEKR1 transcription start site overlap evidence of active 
enhancer activity in adipose nuclei. Signal enrichment tracks are from the 
ENCODE Integrative Analysis and the Roadmap Epigenomics track hubs on 
the UCSC Genome Browser. Transcripts are from the GENCODE basic 
annotation. 
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Extended Data Table 1 | WHRadjBMI loci with multiple association signals in the sex-combined and/or sex-specific approximate conditional 
meta-analyses 


Sex-combined Women Men Sex CEUr 
Position Nearest diff. with lead 
Locus” SNP (bp) gene(s) EA' EAF 8B P N B P N B P N p* SNP 
TBX15- —_rs2645294. «119,376,110 WARS2 T 0.6 0.031 7.60E-19 209,808 0.035 1.50E-14 116,596 0.014 2.20E-02 93,346 4.90E-03 Same 
WARS2 _s1106529 = 119,333,020 7TBX15 A 08 0.016 1.40E-03 209,930 0.021 1.10E-03 116,663 0.034 4.80E-09 1.10E-01 0.43 
[chr 1] 1s12143789 119,298,677 TBX15 C 0.2 0.026 1.00E-09 209,874 0.022 1.30E-04 116,640 0.019 2.30E-03 7.10E-01 0.06 
rs12731372 118,654,498 SPAG17 C 08 0.024 1.30E-09 209,856 0.02 110E-04 116,636 0.028 340E-06 2.80E-01 >500 kb 
GRB14- 51128249! 165,236,870 COBLL1 G 0.6 0.062 8.60E-19 209,414 0.093 1.00E-24 116,348 -0.002 7.10E-0 8.60E-22 0.93 
COBLL1 1812692737 165,262,555 COBLL1 A 0.3 0.043 1.60E-08 203,265 0.134 2.70E-26 112,317 0.003 5.70E-01 91,082 2.80E-21 0.71 
[chr 2]  rs12692738 165,266,498 COBLL1 T 0.8 0.021 5.90E-05 209,551 0,092 3.80E-20 116,474 -0.005 4.10E-0 4.70E-18 0.3 
rs17185198 165,268,482 COBLL1 A 0.8 0.002 7.40E-01 207,702 0.072 8.50E-13 115,657 -0.004 5.80E-0 8.00E-11 0.15 
PRBM1 —_s13083798 52,624,788 PRBM1 A 0.5 0.023 4.10E-11 209,128 0.013 1.20E-01 115,974 0.016 1.10E-03 7.40E-01 0.88 
[chr3]  rs12489828 52,542,054 NT5DC2 T 06 0.011 650E-02 204485 0.029 2.60E-10 112,633 -0.015 2.90E-03 91,986 7.20E-11 0.57 
MAP3K1 _s3936510 55,896,623 MAP3K1 T 0.2 0.022 1.50E-06 207,896 0.042 6.00E-12 115,645 -0.002 8.20E-01 92.386 5.90E-07 0.88 
[chr 5]  1rs459193 55,842,508 ANKRD55 A 0.3 0.026 1.60E-11 209,952 0.016 1.90E-03 116.677 0.033 6.70E-09 93,410 2.30E-02 0.06 
VEGFA _rsg98584S_ =—s-« 43,865,874. VEGFA A 0.5 0.043 1.10E-29 189,620 0.065 1.00E-35 106,771 0.018 8.20E-04 82983 3.10E-10 0.84 
[chr6] 1s4714699 43,910,541 VEGFA C 04 0.019 3.50E-07 193,327 0.028 1.00E-08 107,987 0.007 1.90E-01 85.475 4.90E-03 0.01 
RSPO3 _+s1936805$ 127,493,809 RSPO3 T 0.5 0.038 2.00E-28 209,859 0.071 6.40E-37 116,602 0.031 3.30E-10 93,392 8.40E-08 Same 
[chr 6] 1s11961815 127,477,288 RSPO3 A 08 0.022 5.00E-06 209,679 0.037 6.50E-09 116,503 0.021 3.60£-03 93,310 690E-02 0.32 
rs72959041' 127,496,586 RSPO3 <A 0.1 0.101 8.70E-15 72,472 - - - - : - : 0.05 
NFE2L3, 181534696 26,363,764 SNX10 C 04 0.011 2.00E-03 198,194 0.028 2.00E-08 111,643 -0.007 1.90E-01 86,685 2.20E-07 Same 
SNX10" 1810245353 25,825,139 NFE2L3 A 0.2 0.035 8.40E-16 210,008 0.016 1.30E£-01 116,704 0.027 140E-05 93,438 3.60E-01 Same 
[chr 7] 1s3902751 25,828,164 NFE2L3 A 0.3 0.009 2.00F-01 209,969 0.039 4.20E-14 116,676 0.019 840E-04 93.427 7.40E-03 0.608" 
HOXC13 181443512 552,628,951 HOXC13 A 0.2 0.016 2.70E-03 209.980 0.04 1.10E-14 116,688 0.012 3.00E-02 93,425 1.80E-04 Same 
[chr 12] rs10783615 52,636,040 HOXC12 G_ 0.1 0.037 6.70E-14 209,368 0.023 8.50E-03 116,356 0.022 1.80E-03 93,146 9.30E-01 0.59 
rs20714498 52,714,278 HOXC4/5/6 A 0.4 0.028 5.00E-15 206,953 0.026 4.60E-08 114,259 0.029 3.40E-08 92,829 6.60E-01 ) 
CCDC92 184765219 123,006,063 CCDC92 C 0.7 0.025 6.90E-12 209,807 0.032 2.50E-11 116,592 0.018 5.30E-04 93.350 3.80E-02 Same 
[chr 12] rs863750 123,071,397 ZNF664 T 0.6 0.022 3.90E-10 209,371 0.031 1.60E-11 116,367 0.015 4.00E-03 93,138 1.80E-02 0.02 


P-values and f coefficients for the association with WHRadjBMI from the joint model in the approximate conditional analysis of combined GWAS and Metabochip studies. SNPs selected by conditional analyses as 
independently associated with WHRadjBMI in a meta-analysis (sex-combined, women- or men-specific) have their respective summary statistics for these analyses marked in black and bold. SNPs not selected by 
a particular conditional analysis as independently associated are marked in grey and show the association analysis results for the SNP conditioned on the locus SNPs selected by GCTA. Sample sizes are from the 
unconditioned meta-analysis. 
*Locus and lead SNPs are defined by Table 1. 
+ The effect allele is the WHRadjBMI-increasing allele in the sex-combined analysis. 
{ Test for sex difference in conditional analysis based on the effect correlation estimate from primary analyses; values significant at the table-wise Bonferroni threshold of 0.05/25 = 2 x 1073 are marked in bold. 
§ SNPs selected by conditional analysis in the sex-combined analysis; proxies were selected by joint conditional analysis in the women- and/or men-specific analyses. 
IISNP not present in the sex-specific meta-analyses due to sample size filter requiring n = 50,000; sample size from GCTA. 


At NFE2L3-SNX10, different lead SNPs were identified in the European and all-ancestry analyses but LD is reported with respect to rs10245353. 
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Extended Data Table 2 | Enrichments of 49 WHRadjBMI signal SNPs with metabolic and anthropometric traits 


'S In concordan 


SNPs in concordant direction with 
direction P <0.05 
Trait Max. sample size N Total P N Total P 
Type 2 diabetes (T2D) 86,200 37 49 2.35E-04 16 49 3.56E-14 
Fasting glucose (FG) 132,996 35 49 1.90E-03 8 49 2.75E-05 
Fasting insulin adjusted for BMI (FladjBMI) 103,496 45 49 4.11E-10 36 49 4.04E-47 
2-hour glucose (G120) 42,853 33 49 1.06E-02 7 49 2.09E-04 
Diastolic blood pressure (DBP) 69,760 34 49 4.70E-03 10 49 3.21E-07 
Systolic blood pressure (SBP) 69,774 38 49 7.10E-05 6 49 1.36E-03 
Body mass index (BMI) 322,120 40 49 4.63E-06 23 49 4.42E-24 
Height 253,209 25 49 5.00E-01 14 49 1.10E-11 
High-density lipoprotein cholesterol (HDL-C) 187,142 45 49 4.11E-10 24 49 1.22E-25 
Low-density lipoprotein cholesterol (LDL-C) 173,067 33 49 1.06E-02 12 49 2.32E-09 
Triglycerides (TG) 177,838 46 49 3.49E-11 29 49 6.02E-34 
Adiponectin 29,347 41 49 9.82E-07 20 49 1.28E-19 
Endometriosis 1,364/7,060 24 45 3.83E-01 4 45 2.58E-02 
Nephropathy (in Chinese subjects) 1,194/902 18 43 8.89E-01 0 43 1.00E+00 
Nephropathy (in Italian subjects) 1,045/1,340 20 43 7.29E-01 1 43 6.63E-01 
Estimated glomerular filtration rate of creatinine (eGFRcrea) 74,354 29 49 1.26E-01 3 49 1.24E-01 
Chronic kidney disease (CKD) 74,354 17 49 9.89E-01 2 49 3.47E-01 
Urine albumin-to-creatinine ratio (UACR) 31,580 22 49 8.04E-01 2 49 3.47E-01 
Menopause 87,802 28 49 1.96E-01 1 49 7.11E-01 
Menarche 38,968 23 49 7.16E-01 2 49 3.47E-01 
Coronary artery disease (CAD) 191,198 27 48 2.35E-01 9 48 2.64E-06 
Femoral neck bone mineral density (FN-BMD) 32,960 25 49 5.00E-01 4 49 3.40E-02 
Lumbar spine bone mineral density (LS-BMD) 31,798 28 49 1.96E-01 3 49 1.24E-01 


The 49 WHRadjBMI SNPs were tested for association with other traits by GWAS meta-analyses performed by other groups (see Methods). The maximum sample size available is shown overall or separately for 
cases/controls. N indicates the number of the total SNPs for which the WHRadjBMI-increasing allele is associated with the trait in the concordant direction (increased levels, except for HDL-C, adiponectin and BMI). 
One-sided binomial P values test whether this number is greater than expected by chance (null P= 0.5 and null P= 0.025, respectively). The tests do not account for correlation between WHRadjBMI and the tested 
traits. P values representing significant column-wise enrichment (P< 0.05/23 tests) are marked in bold. 
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Extended Data Table 3 | Enrichment of 49 WHRadjBMI-associated loci in epigenomic data sets 


Sample Tissue DNase I HS H3K4me1 H3K27ac H3K4me3 H3K9ac 
Adipose Nuclei Adipose - 9.6E-06 1.2E-13 0.0051 0.0010 
GM12878 Blood 0.029 0.032 0.32 0.050 0.030 
Osteoblasts Bone 0.082 4.1E-06 1.8E-04 9.9E-04 - 
Astrocytes Brain 0.013 0.0044 0.0077 0.0047 
Anterior Caudate —_ Brain - 2.9E-04 0.026 0.018 0.015 
Mid Frontal Lobe Brain - 0.029 0.023 0.023 0.036 
Substantia Nigra Brain 7 0.047 S 0.023 0.045 
Cerebellum Brain 0.048 - 7 : - 
Cerebrum Frontal Brain 0.054 7 - : 7 
Frontal Cortex Brain 0.022 - 7 . = 
HUVEC Endothelial 5.0E-05 0.011 0.0011 0.023 0.040 
Adult Liver Liver = 0.0057 = 0.15 0.29 
HepG2 Liver 0.015 7.7E-05 0.023 5.0E-04 0.085 
Hepatocytes Liver 0.59 - - : - 
Huh-7 Liver 0.0024 . = - - 
Myoctye Muscle 2.9E-04 1.3E-04 0.0026 0.015 0.0041 
PSOAS Muscle 0.0012 - : 7 - 
Skeletal Muscle Muscle : 7.3E-04 7.8E-05 0.0075 0.25 
Pancreatic Islet Pancreatic Islets 0.40 0.68 . 0.37 0.61 


Enrichment of WHRadjBMI-associated loci in regulatory elements from selected WHRadjBMI-relevant tissues. P values are derived using asum of binomial distributions (see Methods). P values belowa Bonferroni- 
corrected threshold for 60 tests of 8.3 x 10°“ are indicated in bold font. The binomial-based P values are similar to P values generated from 10,000 permutation tests. Dashes indicate that data sets were not 
available. 
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Extended Data Table 4 | Candidate genes at new loci associated with additional waist and hip-related traits 


SNP Trait 


rs10925060 WCadjBMI 


rs10929925 HIP 
rs2124969 WCadjBMI 


rs1664789 WCadjBMI 
1817472426 WCadjBMI 
18722585 HIPadjBMI 
1s7739232 HIPadjBMI 


rs1144 WCadjBMI 


1813241538 HIPadjBMI 


1s2398893 WHR 
rs7044106 HIPadjBMI 
rs11607976 HIP 


rs1784203  WCadjBMI 
rs1394461 WHR 
rs319564 WHR 
rs4985155 HIP 


rs2047937 WCadjBMI 
rs2034088 HIPadjBMI 


rs1053593 ~HIPadjBMI 


Candidate genes for loci shown on Table 3 based on secondary analyses or literature review. Further details are provided in other Supplementary Tables and the Supplementary Note. Loci are shown in order of 


chromosome and position. 
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Locus 


OR2W5- 
NRLP3 
SOX11 


ITGB6 


ARL15 
CCNJL 

GMDS 
KLHL31 


SRPK2 


KLF14 


PTPDC1 
CS 
MYEOV 


kKIAA1731 
CNTN5S 
GPC6 
PDXDC1 


ZNF423 
VPS53 


HMGXB4 


Expression QTL 
(P <10°) 


PLA2R1 (SAT) 


KLHL31 (SAT) 


SRPK2 (LCL), 
MLL5 (Omental) 
KLF14 (SAT) 


PDXDC1 (SAT) 


VPS53 (Liver, SAT), 
FAM101B (Omental, 
SAT) 

TOM1 (PBMC), 
HMGXB4 (Blood, 
SAT) 


* Gene transcript levels associated with SNP genotype (eQTL) in the indicated tissue(s). 


+ Genes in pathways identified as enriched by GRAIL analysis. 


t Strongest candidate genes identified based on manual literature review. 


§ Traits associated at P<5 x 10~® in GWAS lookups or in the GWAS catalogue using the index SNP or a proxy in high LD (r? > 0.7), and the genes(s) named in those reports. 
|| Non-synonymous variants (nsSNPs) and copy number variants (CNVs) with tag SNPs in high LD with index SNP based on a 1000 Genomes CEU reference panel. DEPICT analysis was not performed for loci 


associated with these traits. 


GRAIL 
(P <0.05)* 


Literature* 


NLRP3 


SOX11 


KLHL31-GCLC- 
ELVOL 


KLF14 


FGF19-FGF4-FGF3 


GPC6 
PLA2G10-NTAN1 


ZNF423-CNEP1R1 


HMGXB4 
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Other GWAS 
signals’ 


Idiopathic 
membranous 
nephropathy 

(PLA2R1, LY75, 
ITGB6, RBMS1) 


HDL cholesterol, 
Triglycerides, 
Type 2 diabetes: 
KLF14 


Femoral neck 
bone mineral 
density, Lumbar 
spine bone 
mineral density, 
Plasma 
phospholipid 
levels, Metabolic 
traits, Height: 
PDXDC1, 
NTAN1 


nsSNPs and CNVs 
(r?>0.7)!! 


NTAN1 (S287P), 
NTAN1 (H283N) 


HMGXB4 (G165V), 
CNVR8147.1 
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doi:10.1038/nature14177 


Genetic studies of body mass index yield 
new insights for obesity biology 


A list of authors and their affiliations appears at the end of the paper 


Obesity is heritable and predisposes to many diseases. To understand the genetic basis of obesity better, here we conduct 
a genome-wide association study and Metabochip meta-analysis of body mass index (BMI), ameasure commonly used to 
define obesity and assess adiposity, in up to 339,224 individuals. This analysis identifies 97 BMI-associated loci (P< 5 x 107°), 
56 of which are novel. Five loci demonstrate clear evidence of several independent association signals, and many loci have 
significant effects on other metabolic phenotypes. The 97 loci account for ~2.7% of BMI variation, and genome-wide 
estimates suggest that common variation accounts for >20° of BMI variation. Pathway analyses provide strong support for 
arole of the central nervous system in obesity susceptibility and implicate new genes and pathways, including those related 
to synaptic function, glutamate signalling, insulin secretion/action, energy metabolism, lipid biology and adipogenesis. 


Obesity is a worldwide epidemic associated with increased morbidity 
and mortality that imposes an enormous burden on individual and pub- 
lic health. Around 40-70% of inter-individual variability in BMI, com- 
monly used to assess obesity, has been attributed to genetic factors’’. At 
least 77 loci have previously been associated with an obesity measure’, 32 
loci from our previous meta-analysis of BMI genome-wide association 
studies (GWAS)°. Nevertheless, most of the genetic variability in BMI 
remains unexplained. Moreover, although analyses of previous genetic 
association results have suggested intriguing biological processes under- 
lying obesity susceptibility, few specific genes supported these pathways”*. 
For the vast majority of loci, the probable causal gene(s) and pathways 
remain unknown. 

To expand the catalogue of BMI susceptibility loci and gain a better 
understanding of the genes and biological pathways influencing obesity, 
we performed the largest GWAS meta-analysis for BMI so far. This work 
doubles the number of individuals contributing GWAS results, incor- 
porates results from > 100,000 individuals genotyped with Metabochip’, 
and nearly doubles the number of BMI-associated loci. Comprehensive 
assessment of meta-analysis results provides several lines of evidence 
supporting candidate genes at many loci and highlights pathways that 
reinforce and expand our understanding of biological processes under- 
lying obesity. 


Identification of 97 genome-wide significant loci 


This BMI meta-analysis included association results for up to 339,224 
individuals from 125 studies, 82 with GWAS results (1 = 236,231) and 
43 with results from Metabochip (nm = 103,047; Extended Data Table 1 
and Supplementary Tables 1-3). After regression on age and sex and in- 
verse normal transformation of the residuals, we carried out association 
analyses with genotypes or imputed genotype dosages. GWAS were 
meta-analysed together, as were Metabochip studies, followed by a com- 
bined GWAS plus Metabochip meta-analysis. In total, we analysed data 
from 322,154 individuals of European descent and 17,072 individuals of 
non-European descent (Extended Data Fig. 1). 

Our primary meta-analysis of European-descent individuals from 
GWAS and Metabochip studies (n = 322,154) identified 77 loci reach- 
ing genome-wide significance (GWS) and separated by at least 500 kilo- 
bases (kb) (Table 1, Extended Data Table 2 and Supplementary Figs 1 
and 2). We carried out additional analyses to explore the effects of power 
and heterogeneity. The inclusion of 17,072 non-European-descent in- 
dividuals (total n = 339,224) identified ten more loci, while secondary 


analyses identified another ten GWS loci (Table 2, Supplementary 
Tables 4-8 and Supplementary Figs 3-9). Of the 97 BMI-associated loci, 
41 have previously been associated with one or more obesity measure**. 
Thus, our current analyses identified 56 novel loci associated with BMI 
(Tables 1 and 2 and Extended Data Table 2). 


Effects of associated loci on BMI 


Newly identified loci generally have lower minor allele frequency and/or 
smaller effect size estimates than previously known loci (Extended Data 
Fig. 2a, b). On the basis of effect estimates in the discovery data set, which 
may be inflated owing to winner’s curse, the 97 loci account for 2.7% of 
BMI phenotypic variance (Supplementary Table 4 and Extended Data 
Fig. 2a, b). We conservatively used only GWS single nucleotide poly- 
morphisms (SNPs) after strict double genomic control correction, which 
probably over-corrects association statistics given the lack of evidence 
for population stratification in family-based analyses'* (Extended Data 
Fig. 3 and Extended Data Table 1). Polygene analyses suggest that SNPs 
with P values well below GWS add significantly to the phenotypic 
variance explained. For example, 2,346 SNPs selected from conditional 
and joint multiple-SNP analysis with P< 5 X 10” * explained 6.6 + 1.1% 
(mean + s.e.m.) of variance, compared to 21.6 + 2.2% explained by 
all HapMap3 SNPs (31-54% of heritability; Fig. 1a). Furthermore, of 
1,909 independent SNPs (pairwise distance >500kb and 7? <0.1) 
included on Metabochip for replication of suggestive BMI associa- 
tions, 1,458 (76.4%) have directionally consistent effects with our pre- 
vious GWAS meta-analysis’ and the non-overlapping samples in the 
current meta-analysis (Extended Data Fig. 2c). On the basis of the 
significant excess of these directionally consistent observations (sign 
testP =2.5X 107 t23y. we estimate ~ 1,007 of the 1,909 SNPs represent 
true BMI associations. 

We compared the effects of our 97 BMI-associated SNPs between the 
sexes, between ethnicities, and across several cross-sections of our data 
(Supplementary Tables 4-11 and Extended Data Fig. 4). Two previously 
identified loci, near SECI6B (P = 5.2 X 10°) and ZFP64(P = 9.1 X 10”), 
showed evidence of heterogeneity between men and women. Both have 
stronger effects in women (Supplementary Table 10). Two SNPs, near 
NEGRI (P= 9.1 X 10 °) and PRKDI (P = 1.9 X 10° °), exhibited sig- 
nificant evidence for heterogeneity of effect between European- and 
African-descent samples, and one SNP, near GBEI (P= 1.3 X 10 *), 
exhibited evidence for heterogeneity between European and east Asian 
individuals (Supplementary Table 9). These findings may reflect true 
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Table 1 | Novel GWS BMI loci in European meta-analysis 


SNP Chr:position Notable gene(s)* Alleles EAF B s.e.m. P value 

rs657452 1:49,362,434 AGBL4(N) A/G 0.394 0.023 0.003 548x108 
rs12286929 11:114,527,614 CADM1(N) G/A 0.523 0.022 0.003 1.31 x 10°14 
rs7903146 10:114,748,339 TCF7L2(B,N) C/T 0.713 0.023 0.003 1.11 x 10713 
rs10132280 14:24,998,019 STXBP6(N) C/A 0.682 0.023 0.003 1.14 x 10713 
rs17094222 10:102,385,430 HIF1AN(N) C/T 0.211 0.025 0.004 5.94 x1071! 
rs7599312 2:213,121,476 ERBB4(D,N) G/A 0.724 0.022 0.003 1.17 x 10°>1° 
182365389 3:61,211,502 FHIT(N) C/T 0.582 0.020 0.003 1.63 x 10°1° 
rs2820292 1:200,050,910 NAV1(N) C/A 0.555 0.020 0.003 1.83 x 10°1° 
rs12885454 14:28,806,589 PRKD1(N) C/A 0.642 0.021 0.003 1.94 x10°1° 
rs16851483 3:142,758,126 RASA2(N) T/G 0.066 0.048 0.008 3.55 x 10° 1° 
rs1167827 7:75,001,105 HIP1(B,N); PMS2L3(B,Q); PMS2P5(Q); G/A 0.553 0.020 0.003 6.33 x 101° 

WBSCR16(Q) 
rs758747 16:3,567,359 NLRC3(N) T/C 0.265 0.023 0.004 7A7 x 107 1° 
rs1928295 9:119,418,304 TLR4(B,N) T/C 0.548 0.019 0.003 7.91 x10°1° 
rs9925964 16:31,037,396 KAT8(N);ZNF646(M,Q); VKORC1(Q); A/G 0.620 0.019 0.003 8.11 x 10°>?° 
ZNF668(Q); STX1B(D); FBXL19(D) 

rs11126666 2:26,782,315 KCNK3(D,N) A/G 0.283 0.021 0.003 1.33 x 10°? 
rs2650492 16:28,240,912 SBK1(D,N); APOBR(B) A/G 0.303 0.021 0.004 1.92 x 10-9 
rs6804842 3:25,081,441 RARB(B) G/A 0.575 0.019 0.003 2.48 x 10-9 
rs4740619 9:15,624,326 C9orf93(C,M,N) T/C 0.542 0.018 0.003 4.56 x 10-9 
rs13191362 6:162,953,340 PARK2(B,D,N) A/G 0.879 0.028 0.005 7.34 x 1079 
rs3736485 15:49,535,902 SCG3(B,D); DMXL2(M,N) A/G 0.454 0.018 0.003 741 x 10-9 
rs17001654 4:77,348,592 NUP54(M); SCARB2(Q,N) G/C 0.153 0.031 0.005 7.76 X 10-9 
rs11191560 10:104,859,028 NT5C2(N); CYP17A1(B); SFXN2(Q) C/T 0.089 0.031 0.005 8.45 x 107-9 
rs1528435 2:181,259,207 UBE2E3(N) T/C 0.631 0.018 0.003 1.20 x 10° 
rs1000940 17:5,223,976 RABEP1(N) G/A 0.320 0.019 0.003 1.28 x 10-8 
rs2033529 6:40,456,631 TDRG1(N); LRFN2(D) G/A 0.293 0.019 0.003 1.39 x10°8 
rs11583200 1:50,332,407 ELAVL4(B,D,N,Q) C/T 0.396 0.018 0.003 1.48 x 10° 
rs9400239 6:109,084,356 FOX03(B,N); HSS00296402(Q) C/T 0.688 0.019 0.003 1.61 x10°8 
rs10733682 9:128,500,735 LMX1B(B,N) A/G 0.478 0.017 0.003 1.83 x 10° 
rs11688816 2:62,906,552 EHBP1(B,N) G/A 0.525 0.017 0.003 1.89 x 10° 
rs11057405 12:121,347,850 CLIP1(N) G/A 0.901 0.031 0.006 2.02 x10 8 
rs11727676 4:145,878,514 HHIP(B,N) TG 0.910 0.036 0.006 2.55 x 108 
rs3849570 3:81,874,802 GBE1(B,M,N) A/C 0.359 0.019 0.003 2.60 x 10 8 
rs6477694 9:110,972,163 EPB41L4B(N); C9o0rf4(D) C/T 0.365 0.017 0.003 2.67 x 10-8 
rs7899106 10:87 ,400,884 GRID1(B,N) G/A 0.052 0.040 0.007 2.96 x10 8 
rs2176598 11:43,820,854 HSD17B12(B,M,N) T/C 0.251 0.020 0.004 2.97 x10 8 
rs2245368 7:76,446,079 PMS2L11(N) C/T 0.180 0.032 0.006 3.19 x 10°® 
rs17724992 19:18,315,825 GDF15(B); PGPEP1(Q,N) A/G 0.746 0.019 0.004 3.42 x10°® 
rs7243357 18:55,034,299 GRP(B,G,N) T/G 0.812 0.022 0.004 3.86 x 108 
rs2033732 8:85,242 264 RALYL(D,N) C/T 0.747 0.019 0.004 4.89 x 10-8 


GWS is defined as P<5 x 10-8. SNP positions are reported according to Build 36 and their alleles are coded based on the positive strand. Alleles (effect/other), effect allele frequency (EAF), beta (B), standard error 
of the mean (s.e.m.) and Pvalues are based on the meta-analysis of GWAS I+II+Metabochip association data from the European sex-combined data set. 

* Notable genes from biological relevance to obesity (B); copy number variation (C); DEPICT analyses (D); GRAIL results (G); BMI-associated variant is in strong LD (? =0.7) with a missense variant in the indicated 
gene (M); gene nearest to index SNP (N); association and eQTL data converge to affect gene expression (Q). 


heterogeneity at these loci, but are most likely due to linkage disequilib- 
rium (LD) differences across ancestries. Effect estimates for 79% of BMI- 
associated SNPs in African-descent samples (P = 9.2 X 10°’) and 91% 
in east Asian samples (P = 1.8 X 10— '5) showed directional consistency 
with our European-only analyses. These results suggest that common 
BMI-associated SNPs have comparable effects across ancestries and 
between sexes. In additional heterogeneity analyses, we detected an 
influence of ascertainment at TCF7L2 (stronger effects in type 2 diabetes 
case/control studies than in population-based studies); however, we saw 
no evidence of systematic ascertainment bias at other loci owing to 
inclusion of case/control studies (Supplementary Tables 10 and 11). 

We also took advantage of LD differences across populations to 
fine-map association signals using Bayesian methods'*’*. At 10 of 27 
loci fine-mapped for BMI on Metabochip, the addition of non-European 
individuals into the meta-analysis either narrowed the genomic region 
containing the 99% credible set, or decreased the number of SNPs in the 
credible set (Supplementary Table 12 and Supplementary Fig. 10). At the 
SECI6B and FTO loci, the all ancestries credible set includes a single 
SNP, although the SNP we highlight at FTO (1s1558902) differs from that 
identified by a recent fine-mapping effort in African-American cohorts’®. 
Fine-mapping efforts using larger, more diverse study samples and more 
complete catalogues of variants will help to further narrow association 
signals. 

We examined the combined effects of lead SNPs at the 97 loci in an 
independent sample of 8,164 European-descent individuals from the 
Health and Retirement Study'’. We observed an average increase of 
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0.1 BMI units (kg per m”) per BMI-increasing allele, equivalent to 260- 
320 g for an individual 160-180 cm in height. There was a 1.8 kg per m* 
difference in mean BMI between the 145 individuals (1.78%) carrying 
the most BMI-increasing alleles (>104) and those carrying the mean 
number of BMI-increasing alleles in the sample (91; Extended Data 
Fig. 2d), corresponding to a difference of 4.6—-5.8 kg for an individual 160- 
180 cm in height, anda 1.5 kg per m” difference (3.8-4.9 kg difference) in 
mean BMI between the 95 individuals (1.16%) carrying the least BMI- 
increasing alleles (<78) and those carrying the mean number. Such 
differences are medically significant in predisposing to development of 
metabolic disease’*. For predicting obesity (BMI = 30 kg per m”), add- 
ing genetic risk score to a model including age, age squared, sex and 
four genotype-based principal components slightly, but significantly 
increases the area under the receiver-operating characteristic curve from 
0.576 to 0.601. 


Additional associated variants at BMI loci 


To identify additional SNPs with independent BMI associations at the 97 
established loci, we used genome-wide complex trait analysis (GCTA)”” 
to perform approximate joint and conditional association analysis” using 
summary statistics from European sex-combined meta-analysis after re- 
moving family-based validation studies (TwinGene and QIMR). GCTA 
confirmed two signals at MC4R previously identified using exact con- 
ditional analyses” , and identified five loci with evidence of independent 
associations (Table 3): second signals near LINCO1122, NLRC3-ADCY9, 
GPRC5B-GP2 and BDNF, and a third signal near MC4R (189944545, 


©2015 Macmillan Publishers Limited. All rights reserved 


Table 2 | GWS BMI loci from secondary analyses 


ARTICLE 


SNP Chr:position Notable gene(s)* Alleles EAF B s.e.m. P value Analysis 
Novel loci 
rs9641123 7:93,035,668 CALCR(B,N); hsa-miR-653(Q) C/G 0.430 0.029 0.005 2.08 x 1071° EPB 
rs7164727 15:70,881,044 LOC100287559(N); BBS4(B,M,Q) T/C 0.671 0.019 0.003 3.92 x 10-9 All 
rs492400 2:219,057,996 PLCD4(B,Q); CYP27A1(B); USP37(N); C/T 0.424 0.024 0.004 6.78 x 1072 Men 
TTLL4(M,Q); STK36(B,M); ZNF142(M); 
RQCD1(Q) 
rs2080454 16:47,620,091 CBLN1(N) C/A 0.413. 0.017 0.003 8.60 x 10°? All 
rs7239883 18:38,401,669 LOC284260(N); RIT2(B,D) G/A 0.391 0.023 0.004 1.51 x 10-8 Women 
1S2836754 21:39,213,610 ETS2(N) C/T 0.599 0.017 0.003 1.61 x 10-8 All 
rs9914578 17:1,951,886 SMG6(D,N); N29617(Q) G/C 0.229 0.020 0.004 2.07 x 10-8 All 
rs977747 1:47,457,264 TAL1(N) T/G 0.403 0.017 0.003 2.18 x 10-8 All 
rs9374842 6:120,227,364 LOC285762(N); T/C 0.744 0.023 0.004 2.67 x 10-8 EPB 
rs4787491 16:29,922,838 MAPK3(D); KCTD13(D); INO80E(N); G/A 0.510 0.022 0.004 2.70 x 10-8 EPB 
TAOK2(D); YPEL3(D); DOC2A(D); 
FAM57B(D) 
rs1441264 13:78,478,920 MIR548A2(N) A/G 0.613. 0.017 0.003 2.96 x 10-8 All 
rs17203016 2:207,963,763 CREB1(B,N); KLF7(B) G/A 0.195 0.021 0.004 3.41 x 108 All 
rs16907751 8:81,538,012 ZBTB10(N) C/T 0.913. 0.047 0.009 3.89 x 10°8 Men 
rs13201877 6:137,717,234 IFNGR1(N); OLIG3(G) G/A 0.140 0.024 0.004 4.29 x 10-8 All 
rs9540493 13:65,103,705 MIR548X2(N); PCDH9(D) A/G 0.452 0.021 0.004 4.97 x 10-8 ERB 
rs1460676 2:164,275,935 FIGN(N) C/T 0.179 0.021 0.004 4.98 x 10° All 
rs6465468 7:95,007,450 ASB4(B,N) T/G 0.306 0.025 0.005 4.98 x 10° Women 
Previously identified loci 
rs6091540 20:50,521,269 ZFP64(N) C/T 0.721 0.030 0.004 2.15 x1071! Women 
rs7715256 5:153,518,086 GALNT10(N) G/T 0.422 0.017 0.003 8.85 x 10° All 
rs2176040 2:226,801,046 L0C646736(N); IRS1(B,Q) A/G 0.365 0.024 0.004 9.99 x 10° Men 
SNP positions are reported according to Build 36 and their alleles are coded based on the positive strand. Alleles (effect/other), EAF, beta (f), s.e.m. and P values are based on the meta-analysis of GWAS 
|+ll+Metabochip association data from the data set shown in the ‘Analysis’ column. EPB denotes European population-based studies, ‘All’ denotes all ancestries. 


* Notable genes from biological relevance to obesity (B); copy number variation (C); DEPICT analyses (D); GRAIL results (G); BMl-associated variant is in strong LD (r? = 0.7) with a missense variant in the indicated 


gene (M); gene nearest to the index SNP (N); association and eQTL data converge to affect gene expression (Q). 


Fig. 1b). Joint conditional analyses at two genomic regions separated by 
>500 kb (the AGBL4-ELAVL4 regions on chr. 1, and the ATP2A1-SBK1 
regions on chr. 16), indicate that these pairs of signals may not be inde- 
pendent owing to extended LD. 


Effects of BMI variants on other traits 

We tested for associations between our 97 BMI-associated index SNPs 
and other metabolic phenotypes (Supplementary Tables 13-15 and 
Extended Data Figs 5 and 6). Thirteen of the twenty-three phenotypes 


tested had significantly more SNPs with effects in the anticipated dir- 
ection than expected by chance (Supplementary Table 16). These results 
corroborate the epidemiological relationships of BMI with metabolic 
traits. Whether this reflects a common genetic aetiology or a causal rela- 
tionship of BMI on these traits requires further investigation. 
Interestingly, some loci showed significant association with traits in the 
opposite direction than expected based on their phenotypic correlation 
with BMI (Extended Data Fig. 5). For example, at HHIP, the BMI- 
increasing allele is associated with decreased type 2 diabetes risk and 
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Figure 1 | Cumulative variance explained and example of secondary signals. 
a, The estimated variance in BMI explained by SNPs selected at a range of 

P values using unrelated individuals from the QIMR (n = 3,924; purple) and 
TwinGene (n = 5,668; gold), their weighted average (cyan), inferred from 
within-family prediction (red; Extended Data Fig. 2), and by all HapMap phase 
III SNPs in 16,275 unrelated individuals from the QIMR, TwinGene and ARIC 


Position on chr18 (Mb) 


studies (orange). b, Plot of the region surrounding MCAR (ref. 36). SNP 
associations from the European sex-combined meta-analysis are plotted with 
joint conditional P values (Pj) indicated for the three conditionally significant 
signals. SNPs are shaded and shaped based on the index SNP with which 
they are in strongest LD (rs6567160 in blue, rs994545 in yellow and rs17066842 
in green). 
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Table 3 | Secondary signals reaching GWS by conditional analysis 


SNP Chr: position Nearest gene Alleles EAF p s.e.m. Variance explained P value 
rs1016287 2:59159129 LINCO1122 T/C 0.294 0.023 0.003 0.021% 2.62 x 1071! 
rs4671328 2:58788786 LINCO1122 T/G 0.457 0.021 0.004 0.021% 2.73 x 10-8 
rs758747 16:3567359 NLRC3 T/C 0.241 0.022 0.004 0.018% 2.00 x 107° 
rs879620 16:3955730 ADCY9 T/C 0.620 0.024 0.004 0.027% 2.17 x 10°° 
rs12446632 16:19842890 GPRC5B G/A 0.860 0.036 0.005 0.031% 1.06 x 10°14 
rs11074446 16:20162624 GP2 T/C 0.867 0.029 0.005 0.019% 1:71.x 10779 
rs6567160 18:55980115 MC4R C/T 0.233 0.048 0.004 0.084% 3.52 x 10°38 
rs17066842 18:56191604 MC4R G/A 0.960 0.051 0.008 0.020% 6.99 x 10710 
rs9944545 18:56109224 MC4R T/C 0.296 0.020 0.004 0.017% 1.01 x 10-8 
rs11030104 11:27641093 BDNF A/G 0.791 0.051 0.004 0.087% 1.26 x 10°34 
rs10835210 11:27652486 BDNF C/A 0.570 0.020 0.004 0.020% 1.25 x 10-8 
SNP positions are reported according to Build 36 and their alleles are coded based on the positive strand. Alleles (effect/other), EAF, estimated beta (f), s.e.m., explained variance, and P values from GCTA. First row 


at each locus represents lead signal, other row(s) represent secondary signals. 


higher high-density lipoprotein cholesterol (HDL). At LOC646736 and 
IRS1, the BMI-increasing allele is associated with reduced risk of coronary 
artery disease (CAD) and diabetic nephropathy, decreased triglyceride 
levels, increased HDL, higher adiponectin, and lower fasting insulin. This 
may be due to increased subcutaneous fat and possible production of 
metabolic mediators protective against the development of metabolic 
disease despite increased adiposity*. These unexpected associations may 
help us to understand better the complex pathophysiology underlying 
these traits, and may indicate benefits or side effects if these regions con- 
tain targets of therapeutic intervention. Furthermore, of our 97 GWS 
loci, 35 (binomial P = 0.0019) were in high LD (r° > 0.7) with one or 
more GWS SNPs in the National Human Genome Research Institute 
(NHGRI) GWAS catalogue (P< 5 x 10°), even after removing anthro- 
pometric trait-associated SNPs. These SNPs were associated not only 
with cardiometabolic traits, but also with schizophrenia, smoking beha- 
viour, irritable bowel syndrome, and Alzheimer’s disease (Supplemen- 
tary Table 17a, b). 


BMI tissues, biological pathways and gene sets 

We anticipated the expanded sample size would not only identify 
additional BMI-associated variants, but also more clearly highlight 
the biology implicated by genetic studies of BMI. By applying multiple 
complementary methods, we identified biologically relevant tissues, 
pathways and gene sets, and highlighted potentially causal genes at 
associated loci. These approaches included systematic methods incorp- 
orating diverse data types, including the novel approach, Data-driven 
Expression Prioritized Integration for Complex Traits (DEPICT)”', and 
extensive manual review of the literature. 

DEPICT used 37,427 human gene expression microarray samples to 
identify tissues and cell types in which genes near BMI-associated SNPs 
are highly expressed, and then tested for enrichment of specific tissues by 
comparing results with randomly selected loci matched for gene density. 
In total, 27 out of 31 significantly enriched tissues were in the central ner- 
vous system (CNS) (out of 209 tested; Fig. 2a and Supplementary Table 18). 
Current results are not sufficient to isolate specific brain regions impor- 
tant in regulating BMI. However, we observe enrichment not only in 
the hypothalamus and pituitary gland—key sites of central appetite 
regulation—but even more strongly in the hippocampus and limbic sys- 
tem, tissues that have a role in learning, cognition, emotion and memory. 

As a complementary approach, we examined overlap of associated 
variants at the 97 loci (r* > 0.7 with the lead SNP) with five regulatory 
marks found in most of the 14 selected cell types from brain, blood, liver, 
pancreatic islet and adipose tissue from the ENCODE Consortium” and 
Roadmap Epigenomics Project” (Supplementary Table 19a-c). We found 
evidence of enrichment (P < 1.2 X 10 *) in 24 out of 41 data sets exam- 
ined. The strongest enrichment was observed with promoter (histone 3 
Lys 4 trimethylation (H3K4me3), histone 3 Lys 9 acetylation (H3K9ac)) 
and enhancer (H3K4mel, HeK27ac) marks detected in mid-frontal 
lobe, anterior caudate, astrocytes and substantia nigra, supporting neur- 
onal tissues in BMI regulation. 
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To identify pathways or gene sets implicated by the BMI-associated 
loci, we first used Meta-Analysis Gene-set Enrichment of variaNT Asso- 
ciations (MAGENTA)™, which takes as input pre-annotated gene sets, 
and then tests for overrepresentation of gene set genes at BMI-associated 
loci. We found enrichment (false discovery rate (FDR) < 0.05) of seven 
gene sets, including neurotrophin signalling. Other highlighted gene sets 
related to general growth and patterning: basal cell carcinoma, acute mye- 
loid leukaemia, and hedgehog signalling (Supplementary Table 20a, b). 

Second, we used DEPICT, that uses predefined gene sets reconstituted 
using coexpression data, to perform gene set enrichment analysis. After 
merging highly correlated gene sets, nearly 500 gene sets were signifi- 
cantly enriched (FDR < 0.05) for genes in BMI-associated loci (Fig. 2b 
and Supplementary Table 21a, b). The most strongly enriched gene sets 
highlight potentially novel pathways in the CNS. These include gene sets 
related to synaptic function, long-term potentiation and neurotransmitter 
signalling (glutamate signalling in particular, but also noradrenaline, 
dopamine and serotonin release cycles, and GABA (y-aminobutyric acid) 
receptor activity; Fig. 2c). Potentially relevant mouse behavioural phe- 
notypes, such as physical activity and impaired coordination were also 
highly enriched (Fig. 2b and Supplementary Table 21a). Several gene 
sets previously linked to obesity, such as integration of energy metabo- 
lism, polyphagia, secretion and action of insulin and related hormones 
(for example, ‘regulation of insulin secretion by glucagon-like peptide 
1’ and ‘glucagon signalling in metabolic regulation’), mTOR signalling 
(which affects cell growth in response to nutrient intake via insulin and 
growth factors”), and gene sets overlapping the neurotrophin signal- 
ling pathway identified by MAGENTA were also enriched, although 
notas significantly as other CNS processes (Fig. 2d). DEPICT also iden- 
tified significant enrichment for additional cellular components and 
processes: calcium channels, MAP kinase activity, chromatin organ- 
ization and modification, and ubiquitin ligases. 

Third, we manually reviewed literature related to all 405 genes within 
500 kb and 7° > 0.2 of the 97 index SNPs. We classified these genes into 
one or more biological categories, and observed 25 categories containing 
three or more genes (Supplementary Table 22). The largest category 
comprised genes involved in neuronal processes, including monogenic 
obesity genes involved in hypothalamic function and energy homeosta- 
sis, and genes involved in neuronal transmission and development. 
Other processes highlighted by the manual literature review included 
glucose and lipid homeostasis and limb development, which were less 
notable in the above methods, but may still be related to the underlying 
biology of BMI. 

To identify specific genes that may account for BMI association, we 
considered each of the following to represent supportive evidence for a 
gene within a locus: (1) the gene nearest the index SNP”; (2) genes con- 
taining missense, nonsense or copy number variants, or a cis-expression 
quantitative trait locus (eQTL) in LD with the index SNP; (3) genes 
prioritized by integrative methods implemented in DEPICT; (4) genes 
prioritized by connections in published abstracts by GRAIL (Gene 
Relationships Across Implicated Loci)”; or (5) genes biologically related 
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Figure 2 | Tissues and reconstituted 
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to obesity, related metabolic disease, or energy expenditure based on 
manual literature review (Tables 1 and 2, Extended Data Tables 2-4 and 
Supplementary Tables 23-25). We first focused on the 64 genes in 
associated loci with more than one consistent line of supporting evid- 
ence. As expected, many of these genes overlap with CNS processes, 
including synaptic function, cell-cell adhesion, and glutamate signalling 
(ELAVL4, GRID1, CADM2, NRXN3, NEGRI and SCG3), cause mono- 
genic obesity syndromes (MC4R, BDNF, BBS4 and POMC), or function 
in extreme/early onset obesity in humans and mouse models (SH2B1 
and NEGR1)°*”’. Other genes with several lines of supporting evid- 
ence are related to insulin secretion and action, energy metabolism, lipid 
biology, and/or adipogenesis (TCF7L2, GIPR, IRS1, FOXO3, ASB4, 
RPTOR, NPC1, CREB1, FAM57B, APOBR and HSD17B12), encode 
RNA binding/processing proteins (PTBP2, ELAVL4, CELF1 and pos- 
sibly RALYL), are in the MAP kinase signalling pathway (MAP2K5 and 
MAPK3), or regulate cell proliferation or cell survival (FAIM2, PARK2 
and OLFM4). Although we cannot be certain that any individual gene 
is related to the association at a given locus, the strong enrichment of 
pathways among genes within associated loci argues for a causal role 
for these pathways, prioritizes specific genes for follow-up experiments, 
and provides the strongest genetic evidence so far for a role of particular 
biological and CNS processes in the regulation of human body mass. 


Discussion 


Our meta-analysis of nearly 340,000 individuals identified 97 GWS loci 
associated with BMI, 56 of which are novel. These loci account for 2.7% 
of the variation in BMI, and suggest that as much as 21% of BMI 


Signal release 


variation can be accounted for by common genetic variation. Our ana- 
lyses provide robust evidence to implicate particular genes and path- 
ways affecting BMI, including synaptic plasticity and glutamate receptor 
activity—pathways that respond to changes in feeding and fasting, are 
regulated by key obesity-related molecules such as BDNF and MCAR, 
and impinge on key hypothalamic circuits’. These pathways also 
overlap with one of the several proposed mechanisms of action of topir- 
amate, a component of one of two weight-loss drugs approved by the US 
Food and Drug Administration****. This observation suggests that the 
relevant site of action for this drug may be glutamate receptor activity, 
supporting the idea that these genes and pathways could reveal more 
targets for weight-loss therapies. BMI-associated loci also overlap with 
genes and pathways implicated in neurodevelopment (Supplementary 
Tables 21 and 22). Finally, consistent with previous work and findings 
from monogenic obesity syndromes, we confirm a role for the CNS— 
particularly genes expressed in the hypothalamus—in the regulation of 
body mass. 

Examining the genes at BMI-associated loci in the context of gene 
expression, molecular pathways, eQTL results, mutational evidence and 
genomic location provides several complementary avenues through 
which to prioritize genes for relevance in BMI biology. Genes such as 
NPC1 and ELAVL4 are implicated by many lines of evidence (literature, 
mutational, eQTL and DEPICT) and become strong candidate genes in 
their respective locations. It is important to recognize that pathway 
methods and literature reviews are limited by current data sets and 
knowledge, and thus provide only a working model of obesity biology. 
For example, little is known about host genetic factors that regulate the 
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microbiome. Variation in immune-related genes such as TLR4 could 
presumably exert an influence on obesity through the microbiome”. 
Together, our results underscore the heterogeneous aetiology of obesity 
and its links with several related metabolic diseases and processes. 

BMI variants are generally associated with related cardiometabolic 
traits in accord with established epidemiological relationships. This 
could be due to shared genetic effects or to other causes of cross- 
phenotypic correlations. However, some BMI-associated variants have 
effects on related traits counter to epidemiological expectations. Once 
better understood, these mechanisms may not only help to explain why 
not all obese individuals develop related metabolic diseases, but also 
suggest possible mechanisms to prevent development of metabolic dis- 
ease in those who are already obese. 

Larger studies of common genetic variation, studies of rare variation 
(including those based on imputation, exome chips and sequencing), 
and improved computational tools will continue to identify genetic vari- 
ants associated with BMI and help to further refine the biology of obesity. 
The 97 loci identified here represent an important step in understanding 
the physiological mechanisms leading to obesity. These findings streng- 
then the connection between obesity and other metabolic diseases, enhan- 
ce our appreciation of the tissues, physiological processes, and molecular 
pathways that contribute to obesity, and will guide future research aimed 
at unravelling the complex biology of obesity. 


Online Content Methods, along with any additional Extended Data display items 
and Source Data, are available in the online version of the paper; references unique 
to these sections appear only in the online paper. 


Received 20 November 2013; accepted 23 December 2014. 


1. Maes, H.H., Neale, M. C. & Eaves, L. J. Genetic and environmental factors in relative 
body weight and human adiposity. Behav. Genet. 27, 325-351 (1997). 

2. Visscher, P. M., Brown, M.A, McCarthy, M. |. & Yang, J. Five years of GWAS 
discovery. Am. J. Hum. Genet. 90, 7-24 (2012). 

3. Zaitlen, N. et a/. Using extended genealogy to estimate components of heritability 
for 23 quantitative and dichotomous traits. PLoS Genet. 9, €1003520 (2013). 

4. Fall, T. & Ingelsson, E. Genome-wide association studies of obesity and metabolic 
syndrome. Mol. Cell. Endocrinol. 382, 740-757 (2014). 

5. Speliotes, E. K. etal. Association analyses of 249,796 individuals reveal 18 new loci 
associated with body mass index. Nature Genet. 42, 937-948 (2010). 

6. Willer, C.J. etal. Six new loci associated with body mass index highlight a neuronal 
influence on body weight regulation. Nature Genet. 41, 25-34 (2009). 

7. Voight, B.F. etal. The metabochip, a custom genotyping array for genetic studies of 
metabolic, cardiovascular, and anthropometric traits. PLoS Genet 8, e1002793 
(2012). 

8. Kilpelainen, T. O. et a/. Genetic variation near IRS1 associates with reduced 
adiposity and an impaired metabolic profile. Nature Genet. 43, 753-760 (2011). 

9. Bradfield, J. P. et al. A genome-wide association meta-analysis identifies new 
childhood obesity loci. Nature Genet. 44, 526-531 (2012). 

0. Monda, K. L. et a/. A meta-analysis identifies new loci associated with body mass 
index in individuals of African ancestry. Nature Genet. 45, 690-696 (2013). 

1. Berndt, S. |. et al. Genome-wide meta-analysis identifies 11 new loci for 
anthropometric traits and provides insights into genetic architecture. Nature 
Genet 45, 501-512 (2013). 

2. Guo, Y. et al. Gene-centric meta-analyses of 108 912 individuals confirm known 
body mass index loci and reveal three novel signals. Hum. Mol. Genet. 22, 184-201 
(2013). 

3. Wood, A. R. et al. Defining the role of common variation in the genomic and 
biological architecture of adult human height. Nature Genet. 46, 1173-1186 
(2014). 

4. Maller, J. B. et a/. Bayesian refinement of association signals for 14 loci in 3 
common diseases. Nature Genet. 44, 1294-1301 (2012). 

5. Wakefield, J. A Bayesian measure of the probability of false discovery in genetic 
epidemiology studies. Am. J. Hum. Genet. 81, 208-227 (2007). 

16. Peters, U. etal. A systematic mapping approach of 16q12.2/FTO and BMI in more 
than 20,000 African Americans narrows in on the underlying functional variation: 
results from the Population Architecture using Genomics and Epidemiology 
(PAGE) study. PLoS Genet. 9, €1003171 (2013). 

17. Juster, F.T. & Suzman, R. An overview of the Health and Retirement Study. J. Hum. 
Resour. 30, S7-S56 (1995). 

18. Bouchonville, M. et a/. Weight loss, exercise or both and cardiometabolic risk 
factors in obese older adults: results of a randomized controlled trial. nt. J. Obes. 
38, 423-431 (2013). 

19. Yang, J., Lee, S. H., Goddard, M. E. & Visscher, P. M. GCTA: a tool for genome-wide 
complex trait analysis. Am. J. Hum. Genet. 88, 76-82 (2011). 

20. Yang, J. et al. Conditional and joint multiple-SNP analysis of GWAS summary 
statistics identifies additional variants influencing complex traits. Nature Genet 
44, 369-375 (2012). 


202 | NATURE | VOL 518 | 12 FEBRUARY 2015 


21. Pers, T. etal. Biological interpretation of genome-wide association studies using 
predicted gene functions. Nat. Commun. 5, 5890 (2014). 

22. The ENCODE Project Consortium. An integrated encyclopedia of DNA elements in 
the human genome. Nature 489, 57-74 (2012). 

23. Bernstein, B. E. eta/. The NIH Roadmap Epigenomics Mapping Consortium. Nature 
Biotechnol. 28, 1045-1048 (2010). 

24. Segré, A. V., Groop, L., Mootha, V. K., Daly, M. J. & Altshuler, D. Common inherited 
variation in mitochondrial genes is not enriched for associations with type 2 
diabetes or related glycemic traits. PLoS Genet. 6, €1001058 (2010). 

25. Wullschleger, S., Loewith, R. & Hall, M. N. TOR signaling in growth and metabolism. 
Cell 124, 471-484 (2006). 

26. Lango Allen, H. et a/. Hundreds of variants clustered in genomic loci and biological 
pathways affect human height. Nature 467, 832-838 (2010). 

27. Raychaudhuri, S. et al. Identifying relationships among genomic disease regions: 
predicting genes at pathogenic SNP associations and rare deletions. PLoS Genet. 
5, e1000534 (2009). 

28. Magi, R. etal. Contribution of 32 GWAS-identified common variants to severe obesity 
in European adults referred for bariatric surgery. PLoS ONE 8, e70735 (2013). 

29. Lee, A. W. et al. Functional inactivation of the genome-wide association study 
obesity gene neuronal growth regulator 1 in mice causes a body mass phenotype. 
PLoS ONE 7, e41537 (2012). 

30. Yang, Y., Atasoy, D., Su, H. H. & Sternson, S. M. Hunger states switch a flip-flop 
memory circuit via a synaptic AMPK-dependent positive feedback loop. Cel/ 146, 
992-1003 (2011). 

31. Wu, Q., Clark, M. S. & Palmiter, R. D. Deciphering a neuronal circuit that mediates 
appetite. Nature 483, 594-597 (2012). 

32. Shen, Y., Fu, W.Y.,Cheng, E.Y., Fu, A. K. & lp, N.Y. Melanocortin-4 receptor regulates 

hippocampal synaptic plasticity through a protein kinase A-dependent 

mechanism. J. Neurosci. 33, 464-472 (2013). 

33. Gibbs, J. W., Ill, Sombati, S., DeLorenzo, R. J. & Coulter, D. A. Cellular actions of 

opiramate: blockade of kainate-evoked inward currents in cultured hippocampal 

neurons. Epilepsia 41 (suppl. 1), S10-S16 (2000). 

34. Poulsen, C. F. et al. Modulation by topiramate of AMPA and kainate mediated 

calcium influx in cultured cerebral cortical, hippocampal and cerebellar neurons. 

Neurochem. Res. 29, 275-282 (2004). 

35. Henao-Mejia, J. et al. Inflammasome-mediated dysbiosis regulates progression of 

|AFLD and obesity. Nature 482, 179-185 (2012). 

36. Pruim, R. J. et al. LocusZoom: regional visualization of genome-wide association 
scan results. Bioinformatics 26, 2336-2337 (2010). 


Supplementary Information is available in the online version of the paper. 


Acknowledgements A full list of acknowledgements can be found in the 
Supplementary Information. 


Author Contributions A full list of author contributions can be found in the 
Supplementary Information. 


Author Information Reprints and permissions information is available at 
www.nature.com/reprints. The authors declare competing financial interests: details 
are available in the online version of the paper. Readers are welcome to comment on 
the online version of the paper. Correspondence and requests for materials should be 
addressed to E.K.S. (espeliot@med.umich.edu), RJ.F.L. (ruth.loos@mssm.edu), and 
J.N.H. (joelh@broadinstitute.org). 


Adam E. Locke!*, Bratati Kahali2*, Sonja |. Berndt?*, Anne E. Justice**, Tune H. 
Pers>:®78* Felix R. Day®, Corey Powell, Sailaja Vedantam®®, Martin L. Buchkovich?®, 
Jian Yang!! 12, Damien C. Croteau-Chonka?®!3, Tonu Esko?:®”""4, Tove Fall?>:1627, 
Teresa Ferreira!®, Stefan Gustafsson®”, Zoltan Kutalik!?°7! Jian’an Luan®, Reedik 
agi!+1® Joshua C. Randall?®*?, Thomas W. Winkler?°, Andrew R. Wood?4, 
Tsegaselassie Workalemahu’, Jessica D. Faul?®, Jennifer A. Smith?”, Jing Hua Zhao®, 
Wei Zhao?’, Jin Chen2®, Rudolf Fehrmann22, Asa K. Hedman?©?718 Juha 
Karjalainen, Ellen M. Schmidt®°, Devin Absher*?, Najaf Amin°2, Denise Anderson??, 
arian Beekman®*°°, Jennifer L. Bolton®®, Jennifer L. Bragg-Gresham? >’, Steven 
Buyske?®?, Ayse Demirkan?**°, Guohong Deng*!4249, Georg B. Ehret***°, Bjarke 
Feenstra*®, Mary F. Feitosa*’, Krista Fischer!+, Anuj Goel!®48, Jian Gong*®, Anne U. 
Jackson!, Stavroula Kanoni®°°, Marcus E. Kleber®!'5%, Kati Kristiansson®?, Unhee Lim®‘, 
Vaneet Lotay®°, Massimo Mangino®®, Irene Mateo Leach®’, Carolina 
edina-Gomez®252©° Sarah E. Medland®!, Michael A. Nalls©*, Cameron D. Palmer™®, 
Dorota Pasko**, Sonali Pechlivanis®?, Marjolein J. Peters°®°°, Inga Prokopenko!®:9465, 
Dmitry Shungin®©”©, Alena Stanéakova®’, Rona J. Strawbridge”°, Yun Ju Sung’!, 
Toshiko Tanaka’, Alexander Teumer’®, Stella Trompet’*”°, Sander W. van der 
Laan’®, Jessica van Setten’’, Jana V. Van Vliet-Ostaptchouk’®, Zhaoming Wang?”?, 
Loic Yengo®°®18* Weihua Zhang‘? ®?, Aaron Isaacs****, Eva Albrecht®°, Johan 
Arnlov!®!78° Gillian M. Arscott®’, Antony P. Attwood®®®, Stefania Bandinelli2°, Amy 
Barrett®, Isabelita N. Bas?!, Claire Bellis?*°?, Amanda J. Bennett®, Christian Berne“, 
Roza Blagieva?°, Matthias Bltiher?®®’, Stefan Bohringer?*%8, Lori L. Bonnycastle??, 
Yvonne Bottcher?®, Heather A. Boyd*®, Marcel Bruinenberg?°°, Ida H. Caspersen?°?, 
Yii-Der Ida Chen?°210?, Robert Clarke!™, E. Warwick Daw’, Anton J. M. de Craen’®, 
Graciela Delgado”, Maria Dimitriou!©°, Alex S. F. Doney?°®, Niina Eklund®?1°”, Karol 
Estrada®©°!08 Elodie Eury®°®!8?, Lasse Folkersen’°, Ross M. Fraser®®, Melissa E. 
Garcia!©?, Frank Geller*®, Vilmantas Giedraitis?!°, Bruna Gigante’!?, Alan S. Go'!?, 
Alain Golay??3, Alison H. Goodall?!*1!5, Scott D. Gordon®!, Mathias Gorski23116, 
Hans-Jérgen Grabe!!7118 Harald Grallert®>1!91°, Tanja B. Grammer®!, Jurgen 
GraBler!*!, Henrik Grénberg?®, Christopher J. Groves®*, Gaélle Gusto?*?, Jeffrey 


©2015 Macmillan Publishers Limited. All rights reserved 


Haessler*?, Per Hall’®, Toomas Haller!4, Goran Hallmans??? Catharina A. Hartman?24, 
aija Hassinen?@°, Caroline Hayward!@®, Nancy L. Heard-Costa!?”!?°, Quinta 
Helmer?*°529, Christian Hengstenberg'*°3!, Oddgeir Holmen**?, Jouke-Jan 
Hottenga!$$, Alan L. James!3435, Janina M. Jeff>°, Asa Johansson?=® Jennifer 
Jolley®®®9, Thorhildur Juliusdottir!®, Leena Kinnunen®, Wolfgang Koenig™, Markku 
Koskenvuo!9’, Wolfgang Kratzer!$® Jaana Laitinen’®°, Claudia Lamina’*®, Karin 
Leander!!! Nanette R. Lee®!, Peter Lichtner!*1, Lars Lind’*?, Jaana Lindstrom®?, Ken 
Sin Lo!43, Stéphane Lobbens®°*1**, Roberto Lorbeer!*, Yingchang Lue5145, 
Francois Mach*®, Patrik K. E. Magnusson?>, Anubha Mahajan?®, Wendy L. McArdle?4°, 
Stela McLachlan®®, Cristina Menni°>, Sigrun Merger”®, Evelin Mihailov!“ "47, Lili 
ilani!*, Alireza Moayyeri?©*4®, Keri L. Monda*?*?, Mario A. Morken?%, Antonella 
ulas!©°, Gabriele Miiller?>!, Martina Miller-Nurasyid®®) 2192188, Arthur W. 
usk!54, Ramaiah Nagarajat®>, Markus M. Néthen!°°?97  IIja M. Nolte!®®, Stefan 
Pilz!591©° Nigel W. Rayner!®2. Frida Renstrom®, Rainer Rettig'®!, Janina S. 
Ried®°, Stephan Ripke!°®16, Neil R. Robertson!®**, Lynda M. Rose!®, Serena 
Sanna?®°, Hubert Scharnagl!©*, Salome Scholtens!°, Fredrick R. Schumacher?®, 
William R. Scott*!*3, Thomas Seufferlein!?8, Jianxin Shi*®, Albert Vernon 
Smith?®”1 Joanna Smolonska??!®, Alice V. Stanton?”°, Valgerdur 
Steinthorsdottir!”1, Kathleen Stirrups?*°, Heather M. Stringham!, Johan 
Sundstrém?4?, Morris A. Swertz?®, Amy J. Swift??, Ann-Christine Syvanen!®172, 
Sian-Tsung Tan*?!’3, Bamidele O. Tayo!”*, Barbara Thorand!2°1”5, Gudmar 
Thorleifsson’”!, Jonathan P. Tyrer?”©, Hae-Won Uh**"8, Liesbeth Vandenput?!’’, 
Frank C. Verhulst!78, Sita H. Vermeulen?72!®°, Niek Verweij®”, Judith M. Vonk!®9, 
Lindsay L. Waite?!, Helen R. Warren!®!, Dawn Waterworth?®2, Michael N. Weedon**, 
Lynne R. Wilkens°“, Christina Willenborg!®*!54, Tom Wilsgaard?®°, Mary K. 
Wojczynski*”, Andrew Wong?®®, Alan F. Wright'2°, Qunyuan Zhang*’, The LifeLines 
Cohort Studyt, Eoin P. Brennan?®”, Murim Choi!®®, Zari Dastani!®”, Alexander W. 
Drong?®, Per Eriksson’°, Anders Franco-Cereceda!”°, Jesper R. Gadin’°, Ali G. 
Gharavi!?!, Michael E. Goddard?9#19°, Robert E. Handsaker®”, Jinyan Huang!°4299, 
Fredrik Karpe®*!?®, Sekar Kathiresan®?9”, Sarah Keildson?8, Krzysztof Kiryluk?®?, 
ichiaki Kubo!®, Jong-Young Lee??¥, Liming Liang?®*?° Richard P. Lifton2°?, 
Baoshan Ma??? Steven A. McCarroll®”1?, Amy J. McKnight2°, Josine L. Min?4°, 
iriam F. Moffatt?’S, Grant W. Montgomery®?, Joanne M. Murabito’?”:2*, George 
icholson202°% Dale R. Nyholt®!:29”, Yukinori Okada2°®0, John R. B. Perrys 
Rajkumar Dorajoo*?°, Eva Reinmaa!*, Rany M. Salem®:®’, Niina Sandholm?!4:212.213° 
Robert A. Scott”, Lisette Stolk?*®°, Atsushi Takahashi?°2, Toshihiro Tanaka20?214.215 
Ferdinand M. van 't Hooft”°, Anna A. E. Vinkhuyzen'?, Harm-Jan Westra?°, Wei 
Zheng’, Krina T. Zondervan'®!”, The ADIPOGen Consortiumt, The AGEN-BMI 
Working Groupt, The CARDIOGRAMplusC4D Consortiumt, The CKDGen 
Consortiumt, The GLGCt, The ICBPt, The MAGIC Investigatorst, The MUTHER 
Consortiumt, The MiGen Consortium, The PAGE Consortiumt, The ReproGen 
Consortiumt, The GENIE Consortiumt, The International Endogene Consortiu mi, 
Andrew C. Heath?!®, Dominique Arveiler*?9, Stephan J. L. Bakker*2°, John Beilby®”:22?, 
Richard N. Bergman?2, John Blangero?2, Pascal Bovet?*3224, Harry Campbell®®, 
Mark J. Caulfield'®!, Giancarlo Cesana®2°, Aravinda Chakravarti**, Daniel |. 
Chasman?®?26 Peter S. Chines??, Francis S. Collins?2, Dana C. Crawford?2”228, 
L. Adrienne Cupples!2”?°, Daniele Cusi22°?!, John Danesh?°?, Ulf de Faire?!}, 
Hester M. den Ruijter”©793, Anna F. Dominiczak?5+, Raimund Erbel**°, Jeanette 
Erdmann?®984 Johan G. Eriksson®?:73°237, Martin Farrall?®*8, Stephan B. 
Felix23®239 Ele Ferrannini2*°*4!, Jean Ferri@res?*, lan Ford?4%, Nita G. Forouhi?, 
Terrence Forrester2*4, Oscar H. Franco®®°?, Ron T. Gansevoort22°, Pablo V. 
Gejman?*°, Christian Gieger®°, Omri Gottesman®®, Vilmundur Gudnason!®”1© Ulf 
Gyllensten?’®, Alistair S. Hall?#°, Tamara B. Harris?©°, Andrew T. Hattersley”, Andrew 
A. Hicks48, Lucia A. Hindorff2*?, Aroon D. Hingorani*°°, Albert Hofman°®°°, Georg 
Homuth73, G. Kees Hovingh@5!, Steve E. Humphries?°2, Steven C. Hunt?°, Elina 
Hyppénen29*295:256257 Thomas Illig?!95°, Kevin B. Jacobs*:”?, Marjo-Riitta 
Jarvelin®?:259.260.261,.263263 Kar|_Heinz Jéckel®, Berit Johansen!©!, Pekka 
Jousilahti?’, J. Wouter Jukema’*26*265 Antti M. Jula°?, Jaakko Kaprio®?:!07137, John 
J.P. Kastelein25?, Sirkka M. Keinanen-Kiukaanniemi2°226° Lambertus A. 
Kiemeney?’?:2°7, Paul Knekt®?, Jaspal S. Kooner4?:173:268 Charles Kooperberg*®, 
Peter Kovacs?©?”, Aldi T. Kraja*”, Meena Kumari*©?*”°, Johanna Kuusisto?”!, Timo A. 
Lakka!2572.273 Claudia Langenberg?*©, Loic Le Marchand™, Terho Lehtimaki2”*, 
Valeriya Lyssenko*’°7°, Satu Mannisto°°, André Marette2””*78, Tara C. 
Colin A. McKenzie2**, Barbara McKnight?’?, Frans L. Moll?®°, Andrew D. Morris!°°, 
Andrew P. Morris'*185! Jeffrey C. Murray*®, Mari Nelis'*, Claes Ohlsson?7’, 
Albertine J. Oldehinkel!*, Ken K. Ong??®°, Pamela A. F. Madden?!®, Gerard 
Pasterkamp’®, John F. Peden?®%, Annette Peters!1919°75 Dirkje S. Postma?®4, Peter 
P. Pramstaller?*®85, Jackie F. Price®®, Lu Qi!?25, Olli T. Raitakari2®©8”, Tuomo 
Rankinen?°®, D. C. Rao*”7218 Treva K. Rice’?2!8 Paul M. Ridker!©??25 John D. 
Rioux!43282, Marylyn D. Ritchie?°°, Igor Rudan?®79", Veikko Salomaa®’, Nilesh J. 
Samani!!*115, Jouko Saramies?%%, Mark A. Sarzynski2®8, Heribert Schunkert!3°13?, 
Peter E. H. Schwarz!*193, Peter Sever22* Alan R. Shuldiner?2529®797 Juha 
Sinisalo*°®, Ronald P. Stolk?®°, Konstantin Strauch®®153, Anke Ténjes?©?”, 
David-Alexandre Trégouat?923003°! Angelo Tremblay2°2, Elena Tremoli?°?, Jarmo 
Virtamo®?, Marie-Claude VohI?789°%, Uwe Vélker’*39, Gérard Waeber=°°, Gonneke 
Willemsen??? Jacqueline C. Witteman®®, M. Carola Zillikens°®°°, Linda S. Adair2°°, 
Philippe Amouyel°°”, Folkert W. Asselbergs*°°*°*3°°, Themistocles L. Assimes*°?, 
urielle Bochud?23224, Bernhard O. Boehm?!°3!!, Eric Boerwinkle?!2, Stefan R. 
Bornstein’!, Erwin P. Bottinger®°, Claude Bouchard?°®, Stéphane Cauchi®°®! 82, 
John C. Chambers*!®*:268 Stephen J. Chanock®, Richard S. Cooper?”4, Paul |. W. de 
Bakker’”3!3314 George Dedoussis!°°, Luigi Ferrucci’”*, Paul W. Franks2°°°97, 
Philippe Froguel®>®°%! 2 Leif C. Groop!0”?76, Christopher A. Haiman?©, Anders 
Hamsten’°, Jennie Hui®”22!315, David J. Hunter??251%4 Kristian Hveem32, Robert C. 
Kaplan?!®, Mika Kivimaki°?, Diana Kuh?®®, Markku Laakso®”?, Yongmei Liu??’, 
icholas G. Martin®?, Winfried Marz°?'16+38 Mads Melbye®°?3?°, Andres 
etspalu’*!4”, Susanne Moebus®°, Patricia B. Munroe?®!, Inger Njalstad'®5, Ben A. 
Oostra?*8432° Colin N. A. Palmer?°® Nancy L. Pedersen?®, Markus Perola!+9?:107, 


ARTICLE 


Louis Pérusse?”®°, Ulrike Peters*?, Chris Power?°’, Thomas Quertermous?”, 
Rainer Rauramaa!25273 Fernando Rivadeneira®®>?°° Timo E. Saaristo®?!322, Danish 
Saleheen?3*32354, Naveed Sattar?*°, Eric E. Schadt?2°, David Schlessinger?®9, 

P. Eline Slagboom?*?, Harold Snieder?®, Tim D. Spector®®, Unnur 
Thorsteinsdottir!”13?”, Michael Stumvoll9®2”, Jaakko Tuomilehto®?328329390 André 
G. Uitterlinden®®5", Matti Uusitupa???39?, Pim van der Harst??°”76*, Mark 
Walker??3, Henri Wallaschofski2???94, Nicholas J. Wareham®, Hugh Watkins!®*8, 
David R. Weir2®, H-Erich Wichmann?3°33337| James F. Wilson?®, Pieter Zanen2*°, 
Ingrid B. Borecki*”, Panos Deloukas*”°°3°, Caroline S. Fox??’, Iris M. Heid??®9, 
Jeffrey R. O’Connell#?>°°, David P. Strachan*“°, Kari Stefansson!’!3?”, Cornelia M. 
van Duijne?585984 Goncalo R. Abecasis!, Lude Franke*®, Timothy M. Frayling**, Mark 
|. McCarthy?® 634! Peter M. Visscher!!7, André Scherag®?**?, Cristen J. 
Willer283°343 Michael Boehnke?, Karen L. Mohlke?®, Cecilia M. Lindgren®?8, Jacques 
S. Beckmann2°2!344 [Inés Barroso2?34534® Kari E. North*?47§, Erik 
Ingelsson!®1718§, Joel N. Hirschhorn®®78, Ruth J. F. Loos?°5145348g & Elizabeth K. 
Speliotes?8 


1Center for Statistical Genetics, Department of Biostatistics, University of Michigan, Ann 
Arbor, Michigan 48109, USA. 2Department of Internal Medicine, Division of 
Gastroenterology, and Department of Computational Medicine and Bioinformatics, 
University of Michigan, Ann Arbor, Michigan 48109, USA. 3Division of Cancer 
Epidemiology and Genetics, National Cancer Institute, National Institutes of Health, 
Bethesda, Maryland 20892, USA. “Department of Epidemiology, University of North 
Carolina at Chapel Hill, Chapel Hill, North Carolina 27599, USA. 5Divisions of 
Endocrinology and Genetics and Center for Basic and Translational Obesity Research, 
Boston Children’s Hospital, Boston, Massachusetts 02115, USA. Broad Institute of the 
assachusetts Institute of Technology and Harvard University, Cambridge, 
assachusetts 02142, USA. 7Department of Genetics, Harvard Medical School, Boston, 
assachusetts 02115, USA. ®Center for Biological Sequence Analysis, Department of 
Systems Biology, Technical University of Denmark, Lyngby 2800, Denmark. 2MRC 
Epidemiology Unit, University of Cambridge School of Clinical Medicine, Institute of 
etabolic Science, Cambridge Biomedical Campus, Cambridge CB2 0QQ, UK. 
1Department of Genetics, University of North Carolina, Chapel Hill, North Carolina 
27599, USA.! 1Queensland Brain Institute, The University of Queensland, Brisbane 4072, 
Australia. !2The University of Queensland Diamantina Institute, The Translation Research 
nstitute, Brisbane 4012, Australia. 13Channing Division of Network Medicine, 
Department of Medicine, Brigham and Women’s Hospital and Harvard Medical School, 
Boston, Massachusetts 02115, USA. ‘Estonian Genome Center, University of Tartu, Tartu 
51010, Estonia. ‘Department of Medical Epidemiology and Biostatistics, Karolinska 
nstitutet, Stockholm 17177, Sweden. !®Science for Life Laboratory, Uppsala University, 
Uppsala 75185, Sweden. !’Department of Medical Sciences, Molecular Epidemiology, 
Uppsala University, Uppsala 75185, Sweden. !8Wellcome Trust Centre for Human 
Genetics, University of Oxford, Oxford OX3 7BN, UK. 19institute of Social and Preventive 
edicine (IUMSP), Centre Hospitalier Universitaire Vaudois (CHUV), Lausanne 1010, 
Switzerland. 2°Swiss Institute of Bioinformatics, Lausanne 1015, Switzerland. 
21Department of Medical Genetics, University of Lausanne, Lausanne 1005, Switzerland. 
22Wellcome Trust Sanger Institute, Hinxton, Cambridge CB10 1SA, UK. *°Department of 
Genetic Epidemiology, Institute of Epidemiology and Preventive Medicine, University of 
Regensburg, D-93053 Regensburg, Germany. ““Genetics of Complex Traits, University of 
Exeter Medical School, University of Exeter, Exeter EX1 2LU, UK. Department of 
utrition, Harvard School of Public Health, Boston, Massachusetts 02115, USA. 6Survey 
Research Center, Institute for Social Research, University of Michigan, Ann Arbor, 
ichigan 48104, USA. 2”Department of Epidemiology, University of Michigan, Ann Arbor, 
ichigan 48109, USA. 28Department of Internal Medicine, Division of Cardiovascular 
edicine, University of Michigan, Ann Arbor, Michigan 48109, USA. Department of 
Genetics, University Medical Center Groningen, University of Groningen, 9700 RB 
Groningen, The Netherlands. ?°Department of Computational Medicine and 
Bioinformatics, University of Michigan, Ann Arbor, Michigan 48109, USA. 31HudsonAlpha 
nstitute for Biotechnology, Huntsville, Alabama 35806, USA. °*Genetic Epidemiology 
Unit, Department of Epidemiology, Erasmus MC University Medical Center, 3015 GE 
Rotterdam, The Netherlands. *°Telethon Institute for Child Health Research, Centre for 
Child Health Research, The University of Western Australia, Perth, Western Australia 6008, 
Australia. °4Netherlands Consortium for Healthy Aging (NCHA), Leiden University Medical 
Center, Leiden 2300 RC, The Netherlands. 35Department of Molecular Epidemiology, 
Leiden University Medical Center, 2300 RC Leiden, The Netherlands. 3®Centre for 
Population Health Sciences, University of Edinburgh, Teviot Place, Edinburgh EH8 9AG, 
UK. 3’Kidney Epidemiology and Cost Center, University of Michigan, Ann Arbor, Michigan 
48109, USA. 3®Department of Statistics & Biostatistics, Rutgers University, Piscataway, 
New Jersey 08854, USA. 3°Department of Genetics, Rutgers University, Piscataway, New 
Jersey 08854, USA. *°Department of Human Genetics, Leiden University Medical Center, 
2333 ZC Leiden, The Netherlands. Ealing Hospital NHS Trust, Middlesex UB1 3HW, UK. 
42Department of Gastroenterology and Hepatology, Imperial College London, London W2 
1PG, UK. “Institute of infectious Diseases, Southwest Hospital, Third Military Medical 
University, Chongqing, China. “Center for Complex Disease Genomics, 
cKusick-Nathans Institute of Genetic Medicine, Johns Hopkins University School of 
edicine, Baltimore, Maryland 21205, USA. *5Cardiology, Department of Specialties of 
nternal Medicine, Geneva University Hospital, Geneva 1211, Switzerland. 4©Department 
of Epidemiology Research, Statens Serum Institut, Copenhagen DK-2300, Denmark. 
47Department of Genetics, Washington University School of Medicine, St Louis, Missouri 
63110, USA. *8Division of Cardiovacular Medicine, Radcliffe Department of Medicine, 
University of Oxford, Oxford OX3 9DU, UK. *2Division of Public Health Sciences, Fred 
Hutchinson Cancer Research Center, Seattle, Washington 98109, USA. S°william Harvey 
Research Institute, Barts and The London School of Medicine and Dentistry, Queen Mary 
University of London, London EC1M 6BQ, UK. °!Vth Department of Medicine (Nephrology, 
Hypertensiology, Endocrinology, Diabetology, Rheumatology), Medical Faculty of 
annheim, University of Heidelberg, D-68187 Mannheim, Germany. °*Department of 


12 FEBRUARY 2015 | VOL 518 | NATURE | 203 


©2015 Macmillan Publishers Limited. All rights reserved 


ARTICLE 


nternal Medicine II, Ulm University Medical Centre, D-89081 Ulm, Germany. 53National 
nstitute for Health and Welfare, FI-00271 Helsinki, Finland. °“Epidemiology Program, 
University of Hawaii Cancer Center, Honolulu, Hawaii 96813, USA. 55The Charles 
Bronfman Institute for Personalized Medicine, Icahn School of Medicine at Mount Sinai, 
ew York, New York 10029, USA. Department of Twin Research and Genetic 
Epidemiology, King’s College London, London SE1 7EH, UK. °’Department of Cardiology, 
University Medical Center Groningen, University of Groningen, 9700RB Groningen, The 
etherlands. °°Netherlands Consortium for Healthy Aging (NCHA), 3015GE Rotterdam, 
The Netherlands. °?Department of Epidemiology, Erasmus MC University Medical Center, 
3015GE Rotterdam, The Netherlands. ©°Department of Internal Medicine, Erasmus MC 
University Medical Center, 3015GE Rotterdam, The Netherlands. ®!QIMR Berghofer 
edical Research Institute, Brisbane, Queensland 4006, Australia. ©21 aboratory of 
eurogenetics, National Institute on Aging, National Institutes of Health, Bethesda, 
aryland 20892, USA. ©°institute for Medical Informatics, Biometry and Epidemiology 
(IMIBE), University Hospital Essen, 45147 Essen, Germany. “Oxford Centre for Diabetes, 
Endocrinology and Metabolism, University of Oxford, Oxford OX3 7LJ, UK. °°Department 
of Genomics of Common Disease, School of Public Health, Imperial College London, 
Hammersmith Hospital, London W12 ONN, UK. ©6Department of Clinical Sciences, 
Genetic & Molecular Epidemiology Unit, Lund University Diabetes Center, Skane 
University Hosptial, Malmé 205 02, Sweden. °’Department of Public Health and Clinical 
edicine, Unit of Medicine, Umea University, Umea 901 87, Sweden. Department of 
Odontology, Umea University, Umea 901 85, Sweden. University of Eastern Finland, 
Fl-70210 Kuopio, Finland. 7°ntherosclerosis Research Unit, Center for Molecular 
edicine, Department of Medicine, Karolinska Institutet, Stockholm 17176, Sweden. 
71Division of Biostatistics, Washington University School of Medicine, St Louis, Missouri 
63110, USA. 7*Translational Gerontology Branch, National Institute on Aging, Baltimore, 
aryland 21225, USA. 73\ntert aculty Institute for Genetics and Functional Genomics, 
University Medicine Greifswald, D-17475 Greifswald, Germany. 74Department of 
Cardiology, Leiden University Medical Center, 2300 RC Leiden, The Netherlands. 
75Department of Gerontology and Geriatrics, Leiden University Medical Center, 2300 RC 
Leiden, The Netherlands. ”°Experimental Cardiology Laboratory, Division Heart and 
Lungs, University Medical Center Utrecht, 3584 CX Utrecht, The Netherlands. 
77Department of Medical Genetics, University Medical Center Utrecht, 3584 CX Utrecht, 
The Netherlands. ’®Department of Endocrinology, University of Groningen, University 
edical Center Groningen, 9700 RB Groningen, The Netherlands. ’°Core Genotyping 
Facility, SAlC-Frederick, Inc., NCl-Frederick, Frederick, Maryland 21702, USA. S°CNRS 
UMR 8199, F-59019 Lille, France. 5lEuropean Genomic Institute for Diabetes, F-59000 
Lille, France. ®*Université de Lille 2, F-59000 Lille, France. ®°Department of Epidemiology 
and Biostatistics, Imperial College London, London W2 1PG, UK. ®4Center for Medical 
Sytems Biology, 2300 RC Leiden, The Netherlands. ®°Institute of Genetic Epidemiology, 
Helmholtz Zentrum Munchen - German Research Center for Environmental Health, 
D-85764 Neuherberg, Germany. 8&School of Health and Social Studies, Dalarna 
University, SE-791 88 Falun, Sweden. 87PathWest Laboratory Medicine of Western 
Australia, Nedlands, Western Australia 6009, Australia. 88Department of Haematology, 
University of Cambridge, Cambridge CB2 OPT, UK. ®°NHS Blood and Transplant, 
Cambridge CB2 OPT, UK. 2°Geriatric Unit, Azienda Sanitaria Firenze (ASF), 50125 
Florence, Italy. °1USC-Office of Population Studies Foundation, Inc., University of San 
Carlos, Cebu City 6000, Philippines. °7Department of Genetics, Texas Biomedical 
Research Institute, San Antonio, Texas 78227, USA. 9°Genomics Research Centre, 
nstitute of Health and Biomedical Innovation, Queensland University of Technology, 
Brisbane, Queensland 4001, Australia. *4Department of Medical Sciences, 
Endocrinology, Diabetes and Metabolism, Uppsala University, Uppsala 75185, Sweden. 
5Division of Endocrinology, Diabetes and Metabolism, Ulm University Medical Centre, 
D-89081 Ulm, Germany. ?°Integrated Research and Treatment Center (IFB) Adiposity 
Diseases, University of Leipzig, D-04103 Leipzig, Germany. 2’ Department of Medicine, 
University of Leipzig, D-04103 Leipzig, Germany. °°Department of Medical Statistics and 
Bioinformatics, Leiden University Medical Center, 2300 RC Leiden, The Netherlands. 
22Mledical Genomics and Metabolic Genetics Branch, National Human Genome Research 
nstitute, NIH, Bethesda, Maryland 20892, USA. ?™°LifeLines Cohort Study, University 
edical Center Groningen, University of Groningen, 9700 RB Groningen, The 
etherlands. !°!Department of Biology, Norwegian University of Science and Technology, 
7491 Trondheim, Norway. !°*Department of Pediatrics, University of California Los 
Angeles, Torrance, California 90502, USA. 103Transgenomics Institute, Los Angeles 
Biomedical Research Institute, Torrance, California 90502, USA. !°4Clinical Trial Service 
Unit and Epidemiological Studies Unit, Nuffield Department of Population Health, 
University of Oxford, Oxford OX3 7LF, UK. !°°Department of Dietetics-Nutrition, 
Harokopio University, 17671 Athens, Greece. 106Medical Research Institute, University of 
Dundee, Ninewells Hospital and Medical School, Dundee DD1 9SY, UK. !°” Institute for 
olecular Medicine, University of Helsinki, FI-O0014 Helsinki, Finland. 1°8Analytic and 
Translational Genetics Unit, Massachusetts General Hospital and Harvard Medical School, 
Boston, Massachusetts 02114, USA. ?°?Laboratory of Epidemiology and Population 
Sciences, National Institute on Aging, NIH, Bethesda, Maryland 20892, USA. 
110 epartment of Public Health and Caring Sciences, Geriatrics, Uppsala University, 
Uppsala 75185, Sweden. !1!Division of Cardiovascular Epidemiology, Institute of 
Environmental Medicine, Karolinska Institutet, Stockholm, Sweden, Stockholm 17177, 
Sweden. !!Kaiser Permanente, Division of Research, Oakland, California 94612, USA. 
113Service of Therapeutic Education for Diabetes, Obesity and Chronic Diseases, Geneva 
University Hospital, Geneva CH-1211, Switzerland. !44Department of Cardiovascular 
Sciences, University of Leicester, Glenfield Hospital, Leicester LE3 9QP, UK. +!°National 
Institute for Health Research (NIHR) Leicester Cardiovascular Biomedical Research Unit, 
Glenfield Hospital, Leicester LE3 9QP, UK. 1!®Department of Nephrology, University 
Hospital Regensburg, D-93053 Regensburg, Germany. '!”Department of Psychiatry and 
Psychotherapy, University Medicine Greifswald, HELIOS-Hospital Stralsund, D-17475 
Greifswald, Germany. '!German Center for Neurodegenerative Diseases (DZNE), 
Rostock, Greifswald, D-17475 Greifswald, Germany. !!?Research Unit of Molecular 
Epidemiology, Helmholtz Zentrum Miinchen - German Research Center for 
Environmental Health, D-85764 Neuherberg, Germany. 120German Center for Diabetes 


204 | NATURE | VOL 518 | 12 FEBRUARY 2015 


Research (DZD), 85764 Neuherberg, Germany. !21Department of Medicine Ill, University 
Hospital Carl Gustav Carus, Technische Universitat Dresden, D-O1307 Dresden, 
Germany. !22Institut inter Régional pour la Santé, Synergies, F-37520 La Riche, France. 
123Department of Public Health and Clinical Medicine, Unit of Nutritional Research, Umea 
University, Umea 90187, Sweden. !2“Department of Psychiatry, University of Groningen, 
University Medical Center Groningen, 9700RB Groningen, The Netherlands. !2°Kuopio 
Research Institute of Exercise Medicine, 70100 Kuopio, Finland. !2°>MRC Human Genetics 
Unit, Institute of Genetics and Molecular Medicine, University of Edinburgh, Western 
General Hospital, Edinburgh EH4 2XU, UK. !2’National Heart, Lung, and Blood Institute, 
the Framingham Heart Study, Framingham, Massachusetts 01702, USA. !28Department 
of Neurology, Boston University School of Medicine, Boston, Massachusetts 02118, USA. 
129Faculty of Psychology and Education, VU University Amsterdam, 1081BT Amsterdam, 
The Netherlands. ‘°°Deutsches Forschungszentrum fiir Herz-Kreislauferkrankungen 
(DZHK) (German Research Centre for Cardiovascular Research), Munich Heart Alliance, 
D-80636 Munich, Germany. ‘3!Deutsches Herzzentrum Munchen, Technische 
Universitat Miinchen, D-80636 Munich, Germany. !°?Department of Public Health and 
General Practice, Norwegian University of Science and Technology, Trondheim 7489, 
Norway. 1°3Biological Psychology, VU University Amsterdam, 1081BT Amsterdam, The 
Netherlands. 1S4Department of Pulmonary Physiology and Sleep Medicine, Nedlands, 
Western Australia 6009, Australia. !?°School of Medicine and Pharmacology, University of 
Western Australia, Crawley 6009, Australia. ‘*°Uppsala University, Department of 
Immunology, Genetics, Pathology, SciLifeLab, Rudbeck Laboratory, SE-751 85 Uppsala, 
Sweden. !%/Hielt Institute Department of Public Health, University of Helsinki, F-00014 
Helsinki, Finland. 1°®Department of Internal Medicine |, Ulm University Medical Centre, 
D-89081 Ulm, Germany. !%?Finnish Institute of Occupational Health, Fl-90100 Oulu, 
Finland. ‘4°Division of Genetic Epidemiology, Department of Medical Genetics, Molecular 
and Clinical Pharmacology, Innsbruck Medical University, 6020 Innsbruck, Austria. 

141 \hstitute of Human Genetics, Helmholtz Zentrum Miinchen - German Research Center 
for Environmental Health, D-85764 Neuherberg, Germany. '**Department of Medical 
Sciences, Cardiovascular Epidemiology, Uppsala University, Uppsala 75185, Sweden. 
143 Montreal Heart Institute, Montreal, Quebec H1T 1C8, Canada. “Institute for 
Community Medicine, University Medicine Greifswald, D-17475 Greifswald, Germany. 
145The Genetics of Obesity and Related Metabolic Traits Program, The Icahn School of 
Medicine at Mount Sinai, New York, New York 10029, USA. !4°School of Social and 
Community Medicine, University of Bristol, Bristol BS8 2BN, UK. 4” Institute of Molecular 
and Cell Biology, University of Tartu, Tartu 51010, Estonia. !4°Farr Institute of Health 
Informatics Research, University College London, London NW1 2DA, UK. *4°The Center 
for Observational Research, Amgen, Inc., Thousand Oaks, California 91320, USA. 

150 lstituto di Ricerca Genetica e Biomedica (IRGB), Consiglio Nazionale delle Ricerche, 
Cagliari, Sardinia 09042, Italy. '°4Center for Evidence-based Healthcare, University 
Hospital Carl Gustav Carus, Technische Universitat Dresden, D-O1307 Dresden, 
Germany. 1°*Department of Medicine I, University Hospital Grosshadern, 
Ludwig-Maximilians-Universitat, D-81377 Munich, Germany. !°°Institute of Medical 
Informatics, Biometry and Epidemiology, Chair of Genetic Epidemiology, 
Ludwig-Maximilians-Universitat, D-81377 Munich, Germany. °*Department of 
Respiratory Medicine, Sir Charles Gairdner Hospital, Nedlands, Western Australia 6009, 
Australia. ‘®°Laboratory of Genetics, National Institute on Aging, Baltimore, Maryland 
21224, USA. }°°Department of Genomics, Life & Brain Center, University of Bonn, 53127 
Bonn, Germany. !°/ Institute of Human Genetics, University of Bonn, 53127 Bonn, 
Germany. 1°®Department of Epidemiology, University Medical Center Groningen, 
University of Groningen, 9700 RB Groningen, The Netherlands. '°°Department of 
Epidemiology and Biostatistics, Institute for Research in Extramural Medicine, Institute for 
Health and Care Research, VU University Medical Center, 1O81BT Amsterdam, The 
Netherlands. 1©°Department of Internal Medicine, Division of Endocrinology and 
etabolism, Medical University of Graz, 83036 Graz, Austria. !°"Institute of Physiology, 
University Medicine Greifswald, D-17495 Karlsburg, Germany. 1°*Stanley Center for 
Psychiatric Research, Broad Institute of MIT and Harvard, Cambridge, Massachusetts 
02142, USA. !©3Division of Preventive Medicine, Brigham and Women’s Hospital, Boston, 
assachusetts 02215, USA. 1®Clinical Institute of Medical and Chemical Laboratory 
Diagnostics, Medical University of Graz, Graz 8036, Austria. 165Department of Preventive 
edicine, Keck School of Medicine, University of Southern California, Los Angeles, 
California 90089, USA. !®°National Cancer Institute, Bethesda, Maryland 20892, USA. 
1€7\celandic Heart Association, Kopavogur 201, Iceland. !©°University of Iceland, 
Reykjavik 101, Iceland. *®°Department of Epidemiology, University Medical Center 
Groningen, University of Groningen, 9700 RB Groningen, The Netherlands. !”°Molecular 
& Cellular Therapeutics, Royal College of Surgeons in Ireland, 123 St Stephen’s Green, 
Dublin 2, Ireland. 174deCODE Genetics, Amgen Inc., Reykjavik 101, Iceland. 
172Department of Medical Sciences, Molecular Medicine, Uppsala University, Uppsala 
75144, Sweden. !7?National Heart and Lung Institute, Imperial College London, London 
SW3 6LY, UK. !”4Department of Public Health Sciences, Stritch School of Medicine, 
Loyola University of Chicago, Maywood, Illinois 61053, USA. !7°Institute of Epidemiology 
Il, Helmholtz Zentrum Mtinchen - German Research Center for Environmental Health, 
Neuherberg, Germany, D-85764 Neuherberg, Germany. !”°Department of Oncology, 
University of Cambridge, Cambridge CB2 0QQ, UK. !’”Centre for Bone and Arthritis 
Research, Department of Internal Medicine and Clinical Nutrition, Institute of Medicine, 
Sahlgrenska Academy, University of Gothenburg, Gothenburg 413 45, Sweden. 
178Department of Child and Adolescent Psychiatry/Psychology, Erasmus MC University 
Medical Centre, 3000 CB Rotterdam, The Netherlands. '7?Department for Health 
Evidence, Radboud University Medical Centre, 6500 HB Nijmegen, The Netherlands. 
180D epartment of Genetics, Radboud University Medical Centre, 6500 HB Nijmegen, The 
Netherlands. 1®+Department of Clinical Pharmacology, William Harvey Research Institute, 
Barts and The London School of Medicine and Dentistry, Queen Mary University of 
London, London EC1M 6BQ, UK. 182Genetics, GlaxoSmithKline, King of Prussia, 
Pennsylvania 19406, USA. 83German Center for Cardiovascular Research, partner site 
Hamburg/Lubeck/Kiel, 23562 Lubeck, Germany. !*Institut fiir Integrative und 
Experimentelle Genomik, Universitat zu Llibeck, D-23562 Lubeck, Germany. 
185Department of Community Medicine, Faculty of Health Sciences, UiT The Arctic 


©2015 Macmillan Publishers Limited. All rights reserved 


University of Norway, 9037 Tromsg, Norway. 18°MRC Unit for Lifelong Health and Ageing 
at University College London, London WC1B 5JU, UK. 1®’Diabetes Complications 
Research Centre, Conway Institute, School of Medicine and Medical Sciences, University 
College Dublin, Dublin 4, Ireland. 1°®Department of Biomedical Sciences, Seoul National 
University College of Medicine, Seoul, Korea. ®°Lady Davis Institute, Departments of 
Human Genetics, Epidemiology and Biostatistics, McGill University, Montréal, Québec 
H3T1E2, Canada. '?°Cardiothoracic Surgery Unit, Department of Molecular Medicine and 
Surgery, Karolinska Institutet, Stockholm 17176, Sweden. ??!Department of Medicine, 
Columbia University College of Physicians and Surgeons, New York 10032, USA. 
192Biosciences Research Division, Department of Primary Industries, Victoria 3083, 
Australia. !°?Department of Food and Agricultural Systems, University of Melbourne, 
Victoria 3010, Australia. ‘°*Department of Epidemiology, Harvard School of Public 
Health, Boston, Massachusetts 02115, USA. !°°State Key Laboratory of Medical 
Genomics, Shanghai Institute of Hematology, Rui Jin Hospital Affiliated with Shanghai 
Jiao Tong University School of Medicine, Shanghai, China. 1?°NIHR Oxford Biomedical 
Research Centre, OUH Trust, Oxford OX3 7LE, UK. !2’Cardiovascular Research Center, 
Massachusetts General Hospital, Harvard Medical School, Boston, Massachusetts, USA. 
1981 aboratory for Genotyping Development, RIKEN Center for Integrative Medical 
Sciences, Yokohama 230-0045, Japan. 199Center for Genome Science, National Institute 
of Health, Chungcheongbuk-do, Chungbuk 363-951, Republic of Korea. 2°°Harvard 
School of Public Health, Department of Biostatistics, Harvard University, Boston, 
Massachusetts 2115, USA. °1Department of Genetics, Howard Hughes Medical Institute, 
Yale University School of Medicine, New Haven, New Haven, Connecticut 06520, USA. 
202College of Information Science and Technology, Dalian Maritime University, Dalian, 
Liaoning 116026, China. *°°Nephrology Research, Centre for Public Health, Queen’s 
University of Belfast, Belfast, County Down BT9 7AB, UK. 2°Section of General Internal 
Medicine, Boston University School of Medicine, Boston, Massachusetts 02118, USA. 
205D partment of Statistics, University of Oxford, 1 South Parks Road, Oxford OX1 3TG, 
UK. 2°°MIRC Harwell, Harwell Science and Innovation Campus, Harwell OX11 0QG, UK. 
207 Institute of Health and Biomedical Innovation, Queensland University of Technology, 
Brisbane, Queensland 4059, Australia. ?°°Laboratory for Statistical Analysis, RIKEN 
Center for Integrative Medical Sciences, Yokohama 230-0045, Japan. 2°?Department of 
Human Genetics and Disease Diversity, Graduate School of Medical and Dental Sciences, 
Tokyo Medical and Dental University, 113-8510 Tokyo, Japan. 2!°Genome Institute of 
Singapore, Agency for Science, Technology and Research, 138672 Singapore. 

211 Department of Biomedical Engineering and Computational Science, Aalto University 
School of Science, Helsinki Fl-00076, Finland. *!*Department of Medicine, Division of 
Nephrology, Helsinki University Central Hospital, Fl-O0290 Helsinki, Finland. 
213Folkhalsan Institute of Genetics, Folkhalsan Research Center, Fl-00290 Helsinki, 
Finland. 2!4Laboratory for Cardiovascular Diseases, RIKEN Center for Integrative Medical 
Sciences, Yokohama 230-0045, Japan. 215Division of Disease Diversity, Bioresource 
Research Center, Tokyo Medical and Dental University, 113-8510 Tokyo, Japan. 
216Division of Epidemiology, Department of Medicine; Vanderbilt Epidemiology Center; 
and Vanderbilt-Ingram Cancer Center, Vanderbilt University Medical Center, Nashville, 
Tennessee 37075, USA. *!’Nuffield Department of Obstetrics & Gynaecology, University 
of Oxford, Oxford OX3 7BN, UK. ?!®Department of Psychiatry, Washington University 
School of Medicine, St Louis, Missouri 63110, USA. 2!°Department of Epidemiology and 
Public Health, EA3430, University of Strasbourg, Faculty of Medicine, Strasbourg, France. 
220Department of Internal Medicine, University Medical Center Groningen, University of 
Groningen, 9700RB Groningen, The Netherlands. 22 Pathology and Laboratory Medicine, 
The University of Western Australia, Perth, Western Australia 6009, Australia. 
?22Cedars-Sinai Diabetes and Obesity Research Institute, Los Angeles, California 90048, 
USA. 273 Institute of Social and Preventive Medicine (IUMSP), Centre Hospitalier 
Universitaire Vaudois and University of Lausanne, 1010 Lausanne, Switzerland. 
224Ministry of Health, Victoria, Republic of Seychelles. °?°University of Milano, Bicocca, 
20126, Italy. ??°Harvard Medical School, Boston, Massachusetts 02115, USA. 2?”Center 
for Human Genetics Research, Vanderbilt University Medical Center, Nashville, Tennessee 
37203, USA. 2?8Department of Molecular Physiology and Biophysics, Vanderbilt 
University, Nashville, Tennessee 37232, USA. 22?Department of Biostatistics, Boston 
University School of Public Health, Boston, Massachusetts 02118, USA. 22°Department of 
Health Sciences, University of Milano, | 20142, Italy. 2?! Fondazione Filarete, Milano | 
201339, Italy. ?*Department of Public Health and Primary Care, University of Cambridge, 
Cambridge CB1 8RN, UK. 7°?Julius Center for Health Sciences and Primary Care, 
University Medical Center Utrecht, 3584 CX Utrecht, The Netherlands. 7°“Institute of 
Cardiovascular and Medical Sciences, College of Medical, Veterinary and Life Sciences, 
University of Glasgow, Glasgow G12 8TA, UK. 2°Clinic of Cardiology, West-German Heart 
Centre, University Hospital Essen, 45147 Essen, Germany. ?°Department of General 
Practice and Primary Health Care, University of Helsinki, Fl-O0290 Helsinki, Finland. 
237Unit of General Practice, Helsinki University Central Hospital, Helsinki 00290, Finland. 
38D epartment of Internal Medicine B, University Medicine Greifswald, D-17475 
Greifswald, Germany. 23°DZHK (Deutsches Zentrum fiir Herz-Kreislaufforschung - 
German Centre for Cardiovascular Research), partner site Greifswald, D-17475 
Greifswald, Germany. 24D epartment of Internal Medicine, University of Pisa, 56100 Pisa, 
Italy. 24+ National Research Council Institute of Clinical Physiology, University of Pisa, 
56124 Pisa, Italy. 2“*Department of Cardiology, Toulouse University School of Medicine, 
Rangueil Hospital, 31400 Toulouse, France. Robertson Center for Biostatistics, 
University of Glasgow, Glasgow G12 8QQ, UK. 24*UWI Solutions for Developing Countries, 
The University of the West Indies, Mona, Kingston 7, Jamaica. 2*°NorthShore University 
HealthSystem, Evanston, IL 60201, University of Chicago, Chicago, Illinois, USA. 2“Leeds 
MRC Medical Bioinformatics Centre, University of Leeds, Leeds LS2 9LU, UK. 247 Institute 
of Biomedical & Clinical Science, University of Exeter, Barrack Road, Exeter EX2 5DW, UK. 
248 Center for Biomedicine, European Academy Bozen, Bolzano (EURAC), Bolzano 39100, 
Italy (affiliated institute of the University of Lubeck, D-23562 Ltibeck, Germany). 

2431 ivision of Genomic Medicine, National Human Genome Research Institute, National 
Institutes of Health, Bethesda, Maryland 20892, USA. ?°°lnstitute of Cardiovascular 
Science, University College London, London WC1E 6BT, UK. 2°!Department of Vascular 
Medicine, Academic Medical Center, 1105 AZ Amsterdam, The Netherlands. 2°?Centre 


ARTICLE 


or Cardiovascular Genetics, Institute Cardiovascular Sciences, University College London, 
London WC1E 6, UK. 2°?Cardiovascular Genetics Division, Department of Internal 
edicine, University of Utah, Salt Lake City, Utah 84108, USA. 2°4Sansom Institute for 
Health Research, University of South Australia, Adelaide 5000, South Australia, Australia. 
85School of Population Health, University of South Australia, Adelaide 5000, South 
Australia, Australia. 2°°South Australian Health and Medical Research Institute, Adelaide, 
South Australia 5000, Australia. 2°”Population, Policy, and Practice, University College 
London Institute of Child Health, London WC1N 1EH, UK. *°®Hannover Unified Biobank, 
Hannover Medical School, Hannover, D-30625 Hannover, Germany. 259\ational Institute 
or Health and Welfare, Fl-90101 Oulu, Finland. °°MRC Health Protection Agency (HPA) 
Centre for Environment and Health, School of Public Health, Imperial College London, 
London W2 1PG, UK. 2°Unit of Primary Care, Oulu University Hospital, Fl-90220 Oulu, 
Finland. 2©*Biocenter Oulu, University of Oulu, FI-90014 Oulu, Finland. Institute of 
Health Sciences, University of Oulu, Fl-90014 Oulu, Finland. °*Durrer Center for 
Cardiogenetic Research, Interuniversity Cardiology Institute Netherlands (ICIN), 3501 DG 
Utrecht, The Netherlands. 2°°Interuniversity Cardiology Institute of the Netherlands 
(ICIN), 3501 DG Utrecht, The Netherlands. 7°°Unit of Primary Health Care/General 
Practice, Oulu University Hospital, Fl-90220 Oulu, Finland. 2 ”Department of Urology, 
Radboud University Medical Centre, 6500 HB Nijmegen, The Netherlands. 2°*Imperial 
College Healthcare NHS Trust, London W12 OHS, UK. 2©°Department of Epidemiology 
and Public Health, University College London, London WC1E 6BT, UK. 2”°Department of 
Biological and Social Epidemiology, University of Essex, Wivenhoe Park, Colchester, Essex 
C04 3SQ, UK. ?”!Department of Medicine, Kuopio University Hospital and University of 
Eastern Finland, Fl-70210 Kuopio, Finland. 2”*Department of Physiology, Institute of 
Biomedicine, University of Eastern Finland, Kuopio Campus, Fl-70211 Kuopio, Finland. 
273Department of Clinical Physiology and Nuclear Medicine, Kuopio University Hospital 
and University of Eastern Finland, Fl-70210 Kuopio, Finland. 2”*Department of Clinical 
Chemistry, Fimlab Laboratories and School of Medicine University of Tampere, Fl-33520 
Tampere, Finland. 2”°Steno Diabetes Center A/S, Gentofte DK-2820, Denmark. ?”°Lund 
University Diabetes Centre and Department of Clinical Science, Diabetes & Endocrinology 
Unit, Lund University, Malm6 221 00, Sweden. 277 Institut Universitaire de Cardiologie et 
de Pneumologie de Québec, Faculty of Medicine, Laval University, Quebec, QC G1V OA6, 
Canada. ?”*Institute of Nutrition and Functional Foods, Laval University, Quebec, QC G1V 
OA6, Canada. 2”°Department of Biostatistics, University of Washington, Seattle, 
Washington 98195, USA. 2®°Department of Surgery, University Medical Center Utrecht, 
3584 CX Utrecht, The Netherlands. 7°!Department of Biostatistics, University of Liverpool, 
Liverpool L69 3GA, UK. 22*Department of Pediatrics, University of lowa, lowa City, lowa 
52242, USA. 7*3illumina, Inc, Little Chesterford, Cambridge CB10 1XL, UK. University 
of Groningen, University Medical Center Groningen, Department of Pulmonary Medicine 
and Tuberculosis, Groningen, The Netherlands. 7°°Department of Neurology, General 
Central Hospital, Bolzano 39100, Italy. 2°°Department of Clinical Physiology and Nuclear 
edicine, Turku University Hospital, Fl-20521 Turku, Finland. 2°”Research Centre of 
Applied and Preventive Cardiovascular Medicine, University of Turku, Fl-20521 Turku, 
Finland. 78Human Genomics Laboratory, Pennington Biomedical Research Center, 
Baton Rouge, Louisiana 70808, USA. 289 niversité de Montréal, Montreal, Quebec H1T 
1C8, Canada. 72°Center for Systems Genomics, The Pennsylvania State University, 
University Park, Pennsylvania 16802, USA. 2?!Croatian Centre for Global Health, Faculty 
of Medicine, University of Split, 21000 Split, Croatia. ®°South Carelia Central Hospital, 
53130 Lappeenranta, Finland. 2°2Paul Langerhans Institute Dresden, German Center for 
Diabetes Research (DZD), 01307 Dresden, Germany. ?*4International Centre for 
Circulatory Health, Imperial College London, London W2 1PG, UK. ?*°Division of 
Endocrinology, Diabetes and Nutrition, University of Maryland School of Medicine, 
Baltimore, Maryland 21201, USA. 2°°Program for Personalized and Genomic Medicine, 
University of Maryland School of Medicine, Baltimore, Maryland 21201, USA. 2°’ Geriatric 
Research and Education Clinical Center, Vetrans Administration Medical Center, 
Baltimore, Maryland 21201, USA. 2??Helsinki University Central Hospital Heart and Lung 
Center, Department of Medicine, Helsinki University Central Hospital, Fl-O0290 Helsinki, 
Finland. 222Sorbonne Universités, UPMC Univ Paris 06, UMR S 1166, F-75013 Paris, 
France. °°°INSERM, UMR S 1166, Team Genomics and Physiopathology of 
Cardiovascular Diseases, F-75013 Paris, France. °°!Institute for Cardiometabolism And 
utrition (ICAN), F-75013 Paris, France. °*Department of Kinesiology, Laval University, 
Quebec QC G1V OA6, Canada. °°°Dipartimento di Scienze Farmacologiche e 
Biomolecolari, Universita di Milano & Centro Cardiologico Monzino, Instituto di Ricovero e 
Cura a Carattere Scientifico, Milan 20133, Italy. ?°’Department of Food Science and 
utrition, Laval University, Quebec QC G1V OA6, Canada. 2°°Department of Internal 
edicine, University Hospital (CHUV) and University of Lausanne, Lausanne 1011, 
Switzerland. °°°Department of Nutrition, University of North Carolina, Chapel Hill, North 
Carolina 27599, USA. °°’ Institut Pasteur de Lille; INSERM, U744; Université de Lille 2; 
F-59000 Lille, France. ?°°Department of Cardiology, Division Heart and Lungs, University 

edical Center Utrecht, 3584 CX Utrecht, The Netherlands. 3°°Department of Medicine, 
Stanford University School of Medicine, Palo Alto, California 94304, USA. 3+°Lee Kong 
Chian School of Medicine, Imperial College London and Nanyang Technological 
University, Singapore, 637553 Singapore, Singapore. ?!"Department of Interna 
edicine |, Ulm University Medical Centre, D-89081 Ulm, Germany. 3!#Health Science 
Center at Houston, University of Texas, Houston, Texas 77030, USA. °!3Department of 
edicine, Division of Genetics, Brigham and Women’s Hospital, Harvard Medical School, 
Boston, Massachusetts 02115, USA. 3!4Department of Epidemiology, University Medical 
Center Utrecht, 3584 CX Utrecht, The Netherlands. 3+°School of Population Health, The 
University of Western Australia, Nedlands, Western Australia 6009, Australia. °*°Albert 
Einstein College of Medicine, Department of Epidemiology and Population Health, Belfer 
1306, New York 10461, USA. ?!’Center for Human Genetics, Division of Public Health 
Sciences, Wake Forest School of Medicine, Winston-Salem, North Carolina 27157, USA. 
318Synlab Academy, Synlab Services GmbH, 68163 Mannheim, Germany. 
312Department of Clinical Medicine, Copenhagen University, 2200 Copenhagen, 
Denmark. 320Department of Clinical Genetics, Erasmus MC University Medical Center, 
3000 CA Rotterdam, The Netherlands. °2/Finnish Diabetes Association, Kirjoniementie 
15, FI-33680 Tampere, Finland. °22Pirkanmaa Hospital District, Fl-33521 Tampere, 


12 FEBRUARY 2015 | VOL 518 | NATURE | 205 


©2015 Macmillan Publishers Limited. All rights reserved 


ARTICLE 


Finland. ?2°Center for Non-Communicable Diseases, Karatchi, Pakistan. ***Department 
of Medicine, University of Pennsylvania, Philadelphia, Pennsylvania 19104, USA. 3@°BHF 
Glasgow Cardiovascular Research Centre, Division of Cardiovascular and Medical 
Sciences, University of Glasgow, Glasgow G12 8TA, UK. 37°Icahn Institute for Genomics 
and Multiscale Biology, Icahn School of Medicine at Mount Sinai, New York, New York 
10580, USA. °27Faculty of Medicine, University of Iceland, Reykjavik 101, Iceland. 

328 institute for Health Research, University Hospital of La Paz (IdiPaz), 28046 Madrid, 
Spain. 32°Diabetes Research Group, King Abdulaziz University, 21589 Jeddah, Saudi 
Arabia. 3°°Centre for Vascular Prevention, Danube-University Krems, 3500 Krems, 
Austria. °4Department of Public Health and Clinical Nutrition, University of Eastern 
Finland, Finland. 9°*Research Unit, Kuopio University Hospital, F-70210 Kuopio, Finland. 
333institute of Cellular Medicine, Newcastle University, Newcastle NE1 7RU, UK. 
334 Institute of Clinical Chemistry and Laboratory Medicine, University Medicine 
Greifswald, D-17475 Greifswald, Germany. 3° Institute of Medical Informatics, Biometry 
and Epidemiology, Chair of Epidemiology, Ludwig-Maximilians-Universitat, D-85764 
Munich, Germany. ?2°Klinikum Grosshadern, D-81377 Munich, Germany. °°’ Institute of 
Epidemiology |, Helmholtz Zentrum Miinchen - German Research Center for 
Environmental Health, Neuherberg, Germany, D-85764 Neuherberg, Germany. 
338D epartment of Pulmonology, University Medical Center Utrecht, 3584 CX Utrecht, The 
Netherlands. 399Princess Al-Jawhara Al-Brahim Centre of Excellence in Research of 


206 | NATURE | VOL 518 | 12 FEBRUARY 2015 


Hereditary Disorders (PACER-HD), King Abdulaziz University, 21589 Jeddah, Saudi 
Arabia. 3*°Division of Population Health Sciences & Education, St George’s, University of 
London, London SW17 ORE, UK. 3“4Oxford NIHR Biomedical Research Centre, Oxford 
University Hospitals NHS Trust, Oxford OX3 7LJ, UK. 34?Clinical Epidemiology, Integrated 
Research and Treatment Center, Center for Sepsis Control and Care (CSCC), Jena 
University Hospital, 07743 Jena, Germany. °*3Department of Human Genetics, University 
of Michigan, Ann Arbor, Michigan 48109, USA. *4Service of Medical Genetics, CHUV 
University Hospital, 1011 Lausanne, Switzerland. 24°University of Cambridge Metabolic 
Research Laboratories, Institute of Metabolic Science, Addenbrooke's Hospital, 
Cambridge CB2 0QQ, UK. *“°NIHR Cambridge Biomedical Research Centre, Institute of 
Metabolic Science, Addenbrooke’s Hospital, Cambridge CB2 0QQ, UK. °“’Carolina Center 
for Genome Sciences, University of North Carolina at Chapel Hill, Chapel Hill, North 
Carolina 27599, USA. °48The Mindich Child Health and Development Institute, Icahn 
School of Medicine at Mount Sinai, New York, New York 10029, USA. 


+Present address: Second Floor, B-dong, AICT Building, 145 Gwanggyo-ro, 
Yeongyong-gu, Suwon-si, Gyeonggi-do,443-270, South Korea. 

{A list of authors and affiliations appears in the Supplementary Information. 
*These authors contributed equally to this work. 

§These authors jointly supervised this work. 


©2015 Macmillan Publishers Limited. All rights reserved 


METHODS 

Study design. We conducted a two-stage meta-analysis to identify BMI-associated 
loci in European adults (Extended Data Fig. 1 and Extended Data Table 1). In stage 1 
we performed meta-analysis of 80 GWAS (n = 234,069); and stage 2 incorporated 
data from 34 additional studies (n = 88,137) genotyped using Metabochip’ (Supple- 
mentary Tables 1-3). Secondary meta-analyses were also conducted for: (1) all ance- 
stries, (2) European men, (3) European women, and (4) European population-based 
studies. The total number of subjects and SNPs included in each stage for all analyses 
is shown in Extended Data Table 1. No statistical methods were used to predeter- 
mine sample size. 

Phenotype. BMI, measured or self-reported weight in kg per height in metres squ- 
ared (Supplementary Tables 1 and 3) was adjusted for age, age squared, and any 
necessary study-specific covariates (for example, genotype-derived principal com- 
ponents) in a linear regression model. The resulting residuals were transformed to 
approximate normality using inverse normal scores. For studies with no known 
related individuals, residuals were calculated separately by sex and case/control status. 
For family-based studies, residuals were calculated with men and women together, 
adding sex as an additional covariate in the linear regression model. Relatedness was 
accounted for in a study-specific manner (Supplementary Table 2). 

Sample quality control, imputation and association. Following study-specific 
quality control measures (Supplementary Table 2), all contributing GWAS common 
SNPs were imputed using the HapMap phase II CEU reference panel for European- 
descent studies*’, and CEU+ YRI+CHB+JPT HapMap release 22 for the African- 
American and Hispanic GWAS. Directly genotyped (GWAS and Metabochip) and 
imputed variants (GWAS only) were then tested for association with the inverse 
normally transformed BMI residuals using linear regression assuming an additive 
genetic model. Quality control following study level analyses was conducted follow- 
ing procedures outlined elsewhere™. 

Meta-analysis. Fixed effects meta-analyses were conducted using the inverse vari- 
ance-weighted method implemented in METAL”. Study-specific GWAS results as 
well as GWAS meta-analysis results were corrected for genomic control using all 
SNPs". Study-specific Metabochip results as well as Metabochip meta-analysis results 
were genomic-control-corrected using 4,425 SNPs included on Metabochip for rep- 
lication of associations with QT-interval, a phenotype not correlated with BMI, after 
pruning of SNPs within 500 kb of an anthropometry replication SNP. The final meta- 
analysis combined the genomic-control-corrected GWAS and Metabochip meta- 
analysis results. 

Identification of novel loci. We used a distance criterion of +500 kb surrounding 
each GWS peak (P<5 X 10° ®) to define independent loci and to place our results 
in the context of previous studies, including our previous GIANT meta-analyses. Of 
several locus models tested, this definition most closely reflected the loci defined by 
approximate conditional analysis using GCTA (Tables 1 and 2, respectively). Current 
index SNPs falling within 500 kb of a SNP previously associated with BMI, weight, 
extreme obesity or body fat percentage** "' were considered previously identified. 
Characterization of BMI-associated SNP effects. To investigate potential sources 
of heterogeneity between groups we compared the effect estimates of our 97 GWS 
SNPs for men versus women of European ancestry and Europeans versus non- 
Europeans. To address the effects of studies ascertained on a specific disease or phe- 
notype on our results we also compare the effect estimates of European ancestry 
studies of population-based studies with the following European-descent subsets 
of studies: (1) non-population-based studies (that is, those ascertained on a specific 
disease or phenotype); (2) type 2 diabetes cases; (3) type 2 diabetes controls; (4) com- 
bined type 2 diabetes cases and controls; (5) CAD cases; (6) CAD controls; and 
(7) combined CAD cases and controls (Supplementary Tables 10 and 11). We also 
tested for heterogeneity of effect estimates between our European sex-combined 
meta-analysis and results from recent GWAS meta-analyses for BMI in individuals 
of African or east Asian ancestry'**! (Supplementary Table 9). Heterogeneity was 
assessed as described previously”. A Bonferroni-corrected P< 5 X 10 * (corrected 
for 97 tests) was used to assess significance. For heterogeneity tests assessing effects of 
ascertainment, we also used a 5% FDR threshold to assess significance of hetero- 
geneity statistics (Supplementary Table 11). 

Fine-mapping. We compared the meta-analysis results and credible sets of SNPs 
likely to contain the causal variant, based on the method described previously", across 
the European-only, non-European, and all ancestries sex-combined meta-analyses. 
For each index SNP falling within a Metabochip fine-mapping region (27 for BMI), all 
SNPs available within 500 kb on either side of the index SNP were selected. Effect size 
estimates and standard errors for each SNP were converted to approximate Bayes’ 
factors according to the method described previously’’. All approximate Bayes’ fac- 
tors were then summed across the 1-megabase (Mb) region and the proportion of 
the posterior odds of being the causal variant was calculated for each variant (approx- 
imate Bayes’ factor for SNP;/sum of approximate Bayes’ factors for the region). The 
set of SNPs that accounts for 99% of posterior odds of association in the region 
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denotes the set most likely to contain the causal variant for that association region 
(Supplementary Table 12). 
Cumulative effects, risk prediction and variance explained. We assessed the 
cumulative effects of the 97 GWS loci on mean BMI and on their ability to predict 
obesity (BMI = 30 kgm”) using the c statistic from logistic regression models in the 
Health and Retirement Study”, a longitudinal study of 26,000 European Americans 
50 years or older. The variance explained (VarExp) by each SNP was calculated using 
the effect allele frequency (f) and beta (f) from the meta-analyses using the formula 
VarExp = (1 — f2f 

For polygene analyses, the approximate conditional analysis from GCTA”®, was 
used to select SNPs using a range of P value thresholds (that is, 5 X 10°, 5 x 1077, 
.. 5X 10%) based on summary data from the European sex-combined meta- 
analysis excluding TwinGene and QIMR studies. We performed a within-family 
prediction analysis using full-sib pairs selected from independent families (1,622 
pairs from the QIMR cohort and 2,758 pairs from the TwinGene cohort) and then 
SNPs at each threshold were used to calculate the percentage of phenotypic variance 
explained and predict risk (Extended Data Figs 2 and 3). We then confirmed the 
results from population-based prediction and estimation analyses in an independent 
sample of unrelated individuals from the TwinGene (n = 5,668) and QIMR (n= 
3,953) studies (Extended Data Fig. 3 and Fig. 1c). The SNP-derived predictor was 
calculated using the profile scoring approach implemented in PLINK and estimation 
analyses were performed using the all-SNP estimation approach implemented in 
GCTA. 
Enrichment analysis of Metabochip SNPs selected for replication. The 5,055 
SNPs that were included for BMI replication on Metabochip included 1,909 inde- 
pendent SNPs (7 < 0.1 and > 500 kb apart), of which 1,458 displayed directionally 
consistent effect estimates with those reported previously’. To estimate the number 
of Metabochip SNPs truly associated with BMI, we counted the number of SNPs 
with directional consistency (DC) between ref. 5 and a meta-analysis of non-over- 
lapping samples for these 1,909 SNPs. We then calculated DC in the presence of a 
mixture of associated and non-associated SNPs assuming P(DC | associated) = 1 
and P(DC | not associated) = 0.5. In this formulation, DC = R/2 + S, meaning that 
S = 2DC ~ T, in which T equals the total number of SNPs, R equals the number of 
SNPs not associated with BMI, and S equals the number of SNPs associated with 
BMI. With DC = 1,458 and T = 1,909, we estimate S to be 2DC — T= 2X 1,458 — 
1,909 = 1,007. 
Joint and conditional multiple SNP association analysis. To identify additional 
signals in regions of association, we used GCTA™, an approach that uses meta- 
analysis summary statistics and an LD matrix derived from a reference sample, to 
perform approximate joint and conditional SNP association analysis. We used 6,654 
unrelated individuals of European ancestry from the ARIC cohort as the reference 
sample to approximate conditional P values. 
Manual gene annotation and biological description. All genes within 500 kb of 
an index SNP were annotated for molecular function, cellular function, and for 
evidence of association with BMI-related traits in human or animal model experi- 
ments (Supplementary Table 22). We used several avenues for annotation, including 
Spotter (http://csg.sph.umich.edu/boehnke/spotter/), SNIPPER (http://csg.sph. 
umich.edu/boehnke/snipper/), PubMed (http://www.ncbi.nlm.nih.gov/pubmed/), 
OMIM (http:/www.omim.org) and UNIPROT (http://www.uniprot.org/). When 
no genes mapped to this interval the nearest gene on each side of the index SNP 
was annotated. In examining possible functions of genes in the region, we excluded 
any references to GWAS or other genetic association studies. We analysed 405 
genes in the 97 GWS loci and manually curated them into 25 biological categories 
containing more than three genes. 
Functional variants. All variants within 500 kb (HapMap release 22/1000 Genomes 
CEU) and in LD (7° > 0.7) with an index SNP were annotated for functional effects 
based on RefSeq transcripts using Annovar* (http://www.openbioinformatics.org/ 
annovar/). PhastCon, Grantham, GERP, and PolyPhen™ predictions were accessed 
via the Exome Variant Server® (http://evs.gs.washington.edu/EVS), and from SIFT“ 
(http://sift.jcvi.org/) (Extended Data Table 4). 
Copy number variations correlated with BMI index SNPs. To study common 
copy number variations, we used a list of copy number variations well-tagged by 
SNPs in high LD (r* > 0.8) with deletions in European populations from phase 1 
release of the 1000 Genomes Project*” (Supplementary Table 25). 
eQTLs. We examined the cis associations between the 97 GWS SNPs and expression 
of nearby genes in whole blood, lymphocytes, skin, liver, omental fat, subcutaneous 
fat and brain tissue***° (Supplementary Table 23). Conditional analyses were per- 
formed by including both the BMI-associated SNP and the most significant cis- 
associated SNP for the given transcript. Conditional analyses were conducted for all 
data sets, except the brain tissue data set due to limited power. To minimize the 
potential for false-positives, only cis associations below a study-specific FDR of 5% 
(or 1% for some data sets), in LD with the peak SNP (17 > 0.7) for the transcript, and 
with conditional P > 0.05 for the peak SNP, are reported (Extended Data Table 2). 
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MAGENTA. We used the MAGENTA method to test predefined gene sets for enri- 
chment at BMI-associated loci**. We used the GWAS + Metabochip data as input 
and applied default settings. 

GRAIL. We used GRAIL” to identify genes near BMI-associated loci having sim- 
ilarities in the published scientific text using PubMed abstracts as of December 2006. 
The BMI loci were queried against HapMap release 22 for the European panel, and 
we controlled for gene size. 

DEPICT. We used DEPICT to identify the most likely causal gene at a given associ- 
ated locus, reconstituted gene sets enriched for BMI associations, and tissues and cell 
types in which genes from associated loci are highly expressed’. To accomplish this, 
the method relies on publicly available gene sets (including molecular pathways) and 
uses gene expression data from 77,840 gene expression arrays’ to predict which 
other genes are likely to be part of these gene sets, thus combining known annota- 
tions with predicted annotations. For details and negative control analyses please see 
Supplementary Methods. 

We first clumped the European-only GWAS-based meta-analysis summary 
statistics using 500kb flanking regions, LD r°>0.1 and excluded SNPs with 
P=5X 10 *, which resulted in a list of 590 independent SNPs. HapMap phase 
II CEU genotype data” was used to compute LD and genomic coordinates were 
defined by genome build GRCh38. Because the GWAS meta-analysis was based on 
both GWAS and Metabochip studies, there were discrepancies in the index SNPs that 
are referenced in Table 1 of the paper and the ones used in DEPICT, which was run on 
the GWAS data only. Therefore we forced in GWS index SNPs from the GWAS plus 
Metabochip GWA meta-analysis into the DEPICT GWAS- only based analysis. This 
enabled a more straightforward comparison of genes in DEPICT loci and genes in 
GWS loci highlighted by manual lookups, and did not lead to any significant bias 
towards SNPs on Metabochip (data not shown). We forced in 62 of the GWS loci in 
Table 1, so all of the 97 SNPs were among the 590 SNPs. The 590 SNPs were further 
merged into 511 non-overlapping regions (FDR < 0.05) used in DEPICT analysis. 
For additional information on the analysis please refer to Supplementary Methods. 
Cross-trait analyses. To explore the relationship between BMI and an array of 
cardiometabolic traits and diseases, association results for the 97 BMI index SNPs 
were requested from 13 GWAS meta-analysis consortia: DIAGRAM (type 2 dia- 
betes)°*, CARDIoGRAM-C4D (CAD)”, ICBP (systolic and diastolic blood pressure 
(SBP, DBP))°*, GIANT (waist-to-hip ratio, hip circumference, and waist circumfer- 
ence, each unadjusted and adjusted for BMI)'**’, GLGC (HDL, low density lipopro- 
tein cholesterol, triglycerides, and total cholesterol)”, MAGIC (fasting glucose, fasting 
insulin, fasting insulin adjusted for BMI, and two-hour glucose)*'**, ADIPOGen 
(BMI-adjusted adiponectin)“, CKDgen (urine albumin-to-creatinine ratio (UACR), 
estimated glomerular filtration rate, and overall CKD), ReproGen (age at men- 
arche, age at menopause)”, GENIE (diabetic nephropathy)”. Proxies (7? >08in 
CEU) were used when an index SNP was unavailable. 

Enrichment of concordant effects. We compared the effects for the 97 BMI index 
SNP across these related traits using a one-sided binomial test of the number of 
concordant effects versus a null expectation of P = 0.5. Concordant and nominally 
significant (P < 0.05) SNP effects were similarly tested using a one-sided binomial 
test with a null expectation of P = 0.05. We evaluated significance in either test with a 
Bonferroni-corrected threshold of P = 0.002 (0.05/23 traits tested). 

Joint effects of cross-trait associations. To determine the joint effect of all 97 BMI 
loci on other cardiometabolic phenotypes, we used the meta-regression technique 
from ref. 64 to correlate the effect estimates of the BMI-increasing alleles with effect 
estimates from meta-analyses for each of the metabolic traits from other consortia 
(DIAGRAM, MAGIC, ICBP, GLGC, ADIPOGen, ReproGen and CARDIoGRAM). 
Cross-traits heatmap. To explore observed concordance in effects of BMI loci on 
other cardiometabolic and anthropometric traits, we converted the effect estimates 
and standard errors (or P values) from meta-analysis to Z-scores oriented with re- 
spect to the BMI-increasing allele, for each of the 97 BMI index SNPs in the twenty- 
three traits. We then classified each Z-score as follows to generate a vector of the 
Z-score of each trait at each locus: 

0 (not significant) if -2 = Z=2; 

1 (significant positive) if Z > 2; 

—1 (significant negative) if Z< —2. 

Extended Data Fig. 5 displays these locus-trait relationships in a heatmap using 
Euclidean distance and complete linkage clustering to order both loci and traits. 
Cross-traits bubble plot. We also represent the genetic overlap between other car- 
diometabolic traits and BMI susceptibility loci with a bubble plot in which the size of 
each bubble is proportional to the fraction of BMI-associated loci for which there was 
a significant association (P <5 X 107“). Each pair of bubbles is connected by a line 
proportional to the number of significant BMI-increasing loci overlapping between 
the traits. 

NHGRI GWAS catalogue lookups. We extracted previously reported GWAS asso- 
ciation within 500 kb of and 7 > 0.7 with any BMI-index SNP from the NHGRI 
GWAS catalogue”! (http://www.genome.gov/gwastudies; Supplementary Table 17a, 


b). For studies reporting greater than 30 significant hits, additional SNP-trait asso- 
ciations were pulled from the literature and compared to BMI index SNPs the same 
as with other GWAS catalogue studies. 

ENCODE/Roadmap. To identify global enrichment of data sets at the BMI-assoc- 
iated loci we performed permutation-based tests in a subset of 41 open chromatin 
(DNase-seq), histone modification (H3K27ac, H3K4me1, H3K4me3 and H3K9ac), 
and transcription factor binding data sets from the ENCODE Consortium”, Road- 
map Epigenomics Project”* and when available the ENCODE Integrative Analysis” 
(Supplementary Table 19). We processed Roadmap Epigenomics sequencing data 
with multiple biological replicates using MACS2 (ref. 73) and then applied same 
Irreproducible Discovery Rate pipeline used in the ENCODE Integrative Analysis”. 
Roadmap Epigenomics data with only a single replicate were analysed using MACS2 
alone. We examined variants in LD with 97 BMI index SNPs based on 1’ > 0.7 from 
the 1000 Genomes phase 1 version 2 EUR samples”*. We matched the index SNP at 
each locus with 500 variants having no evidence of association (P > 0.5, ~1.2 million 
total variants) with a similar distance to the nearest gene ( + 11,655 bp), number of 
variants in LD (+8 variants), and minor allele frequency. Using these pools, we 
created 10,000 sets of control variants for each of the 97 loci and identified variants 
in LD (7 > 0.7) and within 1 Mb. For each SNP set, we calculated the number of loci 
with at least one variant located in a regulatory region under the assumption that one 
regulatory variant is responsible for each association signal. We estimated the P value 
assuming a sum of binomial distributions to represent the number of index SNPs (or 
their LD proxies; r* > 0.7) that overlap a regulatory data set compared to the expecta- 
tion observed in the 500 matched control sets. Data sets were considered significantly 
enriched if the Pvalue was below a Bonferroni-corrected threshold of 1.2 X 10° °, 
adjusting for 41 tests. 
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European Studies Only 


GWAS Meta-Analysis Metabochip Meta-Analysis 


80 Studies 34 Studies 
234,069 Subjects 88,137 Subjects® 


2,550,021 SNPs* 156,997 SNPs 


Based on 4,319 
SNPs previously 
associated with 
QT-interval (null 
set) 


Based on all SNPs 


Joint GWAS+MC 
Meta-Analysis 


322,154 Subjects® 
2,554,623 SNPs* 


Extended Data Figure 1 | Study design. *The SNP counts reflect sample size filter of n = 50,000. Counts represent the primary European sex-combined 
analysis. Please see Extended Data Table 1 for counts for secondary analyses. 
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Extended Data Figure 2 | Genetic characterization of BMI-associated 


variants. a, Plot of the cumulative phenotypic variance explained by each locus 


ordered by decreasing effect size. b, The relationship between effect size and 


allele frequency. Previously identified loci are blue circles and novel loci are red 


triangles. c, Quantile-quantile (Q—Q) plot of meta-analysis P values for all 
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Extended Data Figure 3 | Partitioning the variance in and risk prediction 
from SNP-derived predictor. a, b, The analyses were performed using 2,758 
full sibling pairs from the TwinGene cohort (a) and 1,622 pairs from the QIMR 
cohort (b). The SNP-based predictor was adjusted for the first 20 principal 
components. The variance of the SNP-based predictor can be partitioned into 
four components (V,, V., C, and C,) using the within-family prediction 
analysis, in which V, is the variance explained by real SNP effects, C, is the 
covariance between predictors attributed to the real effects of SNPs that are not 
in LD but correlated due to population stratification, V. is the accumulated 
variance due to the errors in estimating SNP effects, and C, is the covariance 
between predictors attributed to errors in estimating the effects of SNPs that are 


correlated due to population stratification. Error bars reflect s.e.m. of estimates. 
c, The prediction R* shown on the y axis is the squared correlation between 
phenotype and SNP-based genetic predictor in unrelated individuals from the 
TwinGene (n = 5,668) and QIMR (n = 3,953) studies. The number shown in 
each column is the number of SNPs selected from the GCTA joint and 
conditional analysis at a range of P-value thresholds. In each case, the predictor 
was adjusted by the first 20 principal components. The column in orange is the 
average prediction R* weighted by sample size over the two cohorts. The dashed 
grey line is the value inferred from the within-family prediction analyses using 
this equation R (V,4 Cy (Vg + Ve + Cg + C.). 
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Extended Data Figure 4 | Comparison of BMI-associated index SNPs across _(y axes). c, d, Allele frequencies between ancestry groups, as in a and b. 
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Extended Data Figure 6 | Bubble chart representing the genetic overlap 
across traits at BMI susceptibility loci. Each bubble represents a trait for 
which association results were requested for the 97 GWS BMI loci. The size of 
the bubble is proportional to the number of BMI-increasing loci with a 
significant association. A line connects each pair of bubbles with thickness 
proportional to the number of significant loci shared between the traits. Traits 
tested include the current study BMI SNPs, African-American BMI (AA BMI), 
hip circumference (HIP), HIP adjusted for BMI (HIPadjBM1), waist 
circumference (WC), waist circumference adjusted for BMI (WCadjBMI), 


waist-to-hip ratio (WHR), waist-to-hip ratio adjusted for BMI (WHRadjBMI), 
height, adiponectin, coronary artery disease (CAD), diastolic blood pressure 
(DBP), systolic blood pressure (SBP), high-density lipoprotein (HDL), low- 
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Extended Data Table 1 | Descriptive characteristics of meta-analyses 


Total number Maximum number of Number of 
Meta-analysis of studies subjects SNPs* Acc 


European sex-combined 


GWAS 80 234,069 2,550,021 1.526 
Metabochip 34 88,137 156,997 1,25 
Joint GWAS+Metabochip 114 322,154 2,554,623 1.084 


European men 


GWAS 72 104,666 2,473,152 1.279 

Metabochip 34 48,274 152,326 1.121 

Joint GWAS+Metabochip 106 152,893 2,477,617 1.006 
European women 

GWAS 74 132,115 2,491,697 1.336 

Metabochip 33 39,864 153,086 1.029 

Joint GWAS+Metabochip 107 171,977 2,494,571 1.002 


European population-based 


GWAS 49 162,262 2,502,573 1.385 

Metabochip 20 46,263 155,617 1.034 

Joint GWAS+Metabochip 69 209,521 2,506,448 1.003 
All ancestries 

GWAS 82 236,231 2,550,614 1.451 

Metabochip 43 103,047 181,718 1.25 

Joint GWAS+Metabochip 125 339,224 2,555,496 1.004 


* For the GWAS and joint GWAS+Metabochip analyses, SNP count reflects n = 50,000. 
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Extended Data Table 2 | Previously known GWS BMI loci in European meta-analysis 
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SNP Chr:Position *Notable gene(s) Alleles EAF B SE P value 
rs1558902 16:52,361,075 FTO(B,N) A/T 0.415 0.082 0.003 7.51E-153 
rs6567160 18:55,980,115 MC4R(B,N) C/T 0.236 0.056 0.004 3.93E-53 
1813021737 = 2:622,348 TMEM18(N) G/A 0.828 0.06 0.004 = 1.11E-50 
1810938397 4:44,877,284 GNPDA2(N); GABRG1(B) GIA 0.434 0.04 0.003 3.21E-38 

rs543874 =1:176,156,103 SEC16B(N) G/A 0.193 0.048 0.004 2.62E-35 
rs2207139 6:50,953,449 TFAP2B(B,N) GIA 0.177 0.045 0.004 4.13E-29 
rs11030104 11:27,641,093 BDNF(B,M,N) AIG 0.792 0.041 0.004 5.56E-28 
rs3101336 = 1:72,523,773 NEGR71(B,C,D,N) C/T 0.613 0.033 0.003 2.66E-26 
rs7138803 12:48,533,735 BCDIN3D(N); FAIM2(D) AIG 0.384 0.032 0.003 8.15E-24 
rs10182181 2:25,003,800 sialic aia G/A 0.462 0.031 0.003 8.78E-24 

SH2B1(B,M,Q); APOBR(M,Q),; 
rs3888190 16:28,796,987 ATXN2L(Q); SBK1(Q,D); SULT1A2(Q); A/C 0.403 0.031 0.003 3.14E-23 
TUFM(Q) 

rs1516725 3:187,306,698 ETV5(N) C/T 0.872 0.045 0.005 1.89E-22 
1812446632 16:19,842,890 GPRC5B(C,N); [QCK(Q) GIA 0.865 0.04 0.005 1.48E-18 
rs2287019 19:50,894,012 QPCTL(N); G/PR(B,M) C/T 0.804 0.036 0.004 4.59E-18 
rs16951275 15:65,864,222 MAP2K85(B,D,N); LBXCOR1(M) T/C 0.784 0.031 0.004 1.91E-17 
183817334 11:47,607,569 METI eee Ser Oy TIC 0.407 0.026 0.003 5.15E-17 
rs2112347 5:75,050,998 POC5(M); HUGCR(B); COL4A3BP(B) T/G 0.629 0.026 0.003 6.19E-17 
rs12566985 = 1:74,774,781 FPGT-TNNI3K(N) GIA 0.446 0.024 0.003 3.28E-15 
183810291 19:52,260,843 ZC3H4(D,N,Q) AIG 0.666 0.028 0.004 4.81E-15 
rs7141420 14:78,969,207 NRXN3(D,N) T/C 0.527 0.024 0.003 1.23E-14 
rs13078960 3:85,890,280 CADM2(D,N) G/T 0.196 0.03 0.004 1.74E-14 
rs10968576 9:28,404,339 LINGO2(D,N) GIA 0.32 0.025 0.003 6.61E-14 
rs17024393 1:109,956,211 GNAT2(N); AMPD2(D) C/T 0.04 0.066 0.009 7.03E-14 
1812429545 13:53,000,207 OLFM4(B,N) AIG 0.133 0.033 0.005 1.09E-12 
1813107325 4:103,407,732 SLC39A8(M,N,Q) T/C 0.072 0.048 0.007 1.83E-12 
rs11165643 1:96,696,685 PTBP2(D,N) T/C 0.583 0.022 0.003 2.07E-12 
rs17405819 8:76,969,139 HNF4G(B,N) T/C 0.7 0.022 0.003 2.07E-11 
rs1016287 =2:59,159,129 LINCO1122(N) T/C 0.287 0.023 0.003 2.25E-11 
rs4256980 = 11:8,630,515 TRIM66(D,M,N); TUB(B) G/C 0.646 0.021 0.003 2.90E-11 
rs12401738 1:78,219,349 FUBP1(N); USP33(D) AIG 0.352 0.021 0.003 1.15E-10 
rs205262 = 634,671,142 C6orf106(N); SNRPC(Q) GIA 0.273 0.022 0.004 1.75E-10 
rs12016871 13:26,915,782 MTIF3(N); GTF3A(Q) T/C 0.203 0.03 0.005 2.29E-10 
rs12940622 17:76,230,166 RPTOR(B,N) GIA 0.575 0.018 0.003 2.49E-09 
rs11847697 14:29,584,863 PRKD1(N) T/C 0.042 0.049 0.008 3.99E-09 
rs2075650 19:50,087,459 TOMM40(B,N); APOE(B); APOC7(B) AIG 0.848 0.026 0.005 1.25E-08 
rs2121279 2:142,759,755 LRP1B(N) T/C 0.152 0.025 0.004 2.31E-08 

rs29941 = 19:39,001,372 KCTD15(N) G/A 0.669 0.018 0.003 2.41E-08 
rs1808579 18:19,358,886 NPC1(B,G,M,Q); C78orf8(N,Q) C/T 0.534 0.017 0.003 4.17E-08 


SNP positions are reported according to Build 36 and their alleles are coded based on the positive strand. Effect alleles, allele frequencies, betas (f), s.e.m., sample sizes (n), and P values are based on the 
meta-analysis of GWAS |+ll+Metabochip association data from the European sex-combined data set. 
* Notable genes from biological relevance to obesity (B); GRAIL results (G); BMI-associated variant is in strong LD (r° = 0.7) with a missense variant in the indicated gene (M); gene nearest to Index SNP (N); 
association and eQTL data converge to affect gene expression (Q); DEPICT analyses (D); copy number variation (C). 
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Extended Data Table 3 | Association of the GWS SNPs for BMI with cis-gene expression (cis-eQTLs) 


SNP Chr. 
Novel loci 
rs11583200 1 
rs492400 2 
rs492400 2 
rs492400 2 
rs492400 2 
rs492400 2 
rs17001654 4 
rs9400239 6 
rs9400239 6 
rs1167827 7 
rs1167827 7 
rs1167827 7 
rs1167827 7 
rs1167827 7 
rs1167827 z 
rs1167827 7 
rs9641123 7 
rs11191560 10 
rs11191560 10 
rs7164727 15 
rs9925964 16 
rs9925964 16 
rs9925964 16 
rs9914578 17 
rs1808579 18 
rs1808579 18 
rs17724992 19 
Previously reported loci 
rs10182181 2 
rs2176040 2 
rs13107325 4 
rs205262 6 
rs205262 6 
rs205262 6 
183817334 11 
183817334 11 
rs3817334 11 
rs3817334 11 
rs12016871 13 
rs12016871 13 
rs12446632 16 
rs12446632 16 
rs3888190 16 
rs3888190 16 
rs3888190 16 
rs3888190 16 
rs3888190 16 
rs3888190 16 
rs3888190 16 
rs1808579 18 
rs3888190 16 
rs3810291 19 
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Adipose 


Gene 


ELAVL4 
PLCD4 
RQCD1 
RQCD1 
TTLL4 
TTLL4 
SCARB2 
HSS00296402 
HSS00296402 
PMS2P3 
PMS2P3 
PMS2P3 
PMS2P3 
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hsa-miR-653 
SFXN2 
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VKORC1 
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ATXN2L 
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SH2B1 
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B for 
Giant 
SNP 


-0.066 
-0.054 
0.392 
-0.102 
0.018 
0.158 
0.248 
0.034 
0.015 
-0.595 
-0.027 
-0.030 
-0.346 
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0.025 
0.017 
-0.344 
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0.122 
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-0.010 
-0.073 
-0.014 
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0.022 
-0.036 
-0.101 
-0.462 
-0.127 
-0.012 
-0.051 
0.044 
28.255 
-0.090 
-0.258 
-0.375 
0.028 
0.031 
0.303 
0.084 
-0.063 
-0.407 
-0.014 
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-0.386 


P for 
GIANT SNP 


1.90E-12 
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2.43E-06 
1.33E-10 
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Extended Data Table 4 | Putative coding variants in LD (r? = 0.7) with GWS BMI loci 


BMISNP Chr. Source Putative Coding P Geile Protein PhastCon GERP- Grantham PolyPhen SIFT SIFT Score 
Variant Alteration Score Score Score Prediction 


Novel genome-wide significant loci 


1s492400 2 1000G_srs3770213 0.89 ZNF 142 L956H 0 -1.6 99 possibly damaging Damaging 0 
1s492400 2 1000G_—rs3770214 0.89 ZNF 142 $751G 0.2 1.4 56 benign Tolerated 0.08 
rs492400 2. 1000G = rs2230115 0.963 ZNF 142 A541S 0.5 5.1 99 benign Tolerated 0.044 
1s492400 2 1000G__rs1344642 0.963 STK36 R583Q 0 2.4 43 possibly damaging Damaging 0 
rs492400 2 1000G__—rs1863704 0.89 STK36 G1003D 0 94 possibly damaging Tolerated 0.41 
1s492400 2 1000G__—rs1863704 0.89 STK36 G982D 0 94 possibly damaging - - 
rs492400 2 1000G__—rs3731877 0.792 TTLL4 E34Q 1 5.5 29 probably damaging Unknown Not scored 
rs17001654 4 1000G__rs61750814 1 NUP54 N250S 1 5.5 46 benign Damaging 0.05 
1s4740619 9 1000G__—rs4741510 0.901 CCDC171 $121T 1 2 58 benign Damaging 0.05 
1s4740619 9 1000G_rs1539172 0.74 CCDC171 K1069R 1 41 26 benign Tolerated 1 
1s2176598 11 1000G_ = rs11555762 0.774 HSD17B12 $280L 0 0.4 145 benign Tolerated 0.74 
1s3849570 3 1000G__rs2229519 0.771 GBE1 R190G 1 48 125 benign Damaging 0.04 
183736485 15 1000G_—_rs12102203 0.966 DMXL2 $1288P 0.7 1.7 74 benign Tolerated 0.32 
1s7164727 15 1000G_—rs2277598 0.839 BBS4 182T 0 -4.4 89 benign Tolerated 0.47 
1s9925964 16 1000G__—rs749670 0.869 ZNF646 E327G 1 4.2 98 benign Tolerated 0.44 
Previously identified genome-wide significant loci 

rs10182181 2 HapMap _rs11676272 0.967 ADCY3 $107P 0 2.9 74 benign Tolerated 0.28 
1s13107325 4 1000G__—_rs13107325 1 SLC39A8 A324T 1 44 5.8 benign Tolerated 0.09 
1813107325 4 1000G___rs13107325 1 SLC39A8 A391T 1 44 5.8 benign Tolerated 0.09 
1s2112347 § 1000G__—_rs2307111 0.862 POCS5 H11R 0.9 5.8 29 benign Unknown Not scored 
1s2112347 5 1000G__rs2307111 0.862 POCS5 H36R 0.9 5.8 29 benign Unknown Not scored 
1s4256980 11 HapMap __rs7935453 0.729 TRIM66 L630V - - - - Tolerated 1 
rs4256980 11 1000G__rs11042022 0.876 TRIM66 H466R - - - - Tolerated 0.38 
rs4256980 11 1000G__rs11042023 0.959 TRIM66 H324R 1 5.1 29 probably damaging Damaging 0.03 
rs11030104 11 1000G_ = rs6265 0.817 BDNF V148M 1 5.2 21 probably damaging Damaging 0 
rs11030104 11 1000G_—rs6265 0.817 BDNF V66M 1 5:2 21 probably damaging Damaging 0 
rs11030104 11 1000G_—rs6265 0.817 BDNF V74M 1 5.2 21 probably damaging Damaging 0 
rs11030104 11 1000G__rs6265 0.817 BDNF Vv81M 1 5.2 21 probably damaging Damaging 0 
rs11030104 11 1000G_rs6265 0.817 BDNF V95M 1 5.2 21 probably damaging Damaging 0 
183817334 11 1000G__—rs1064608 0.809 MTCH2 P290A 1 5.1 27 probably damaging Tolerated 0.12 
rs3888190 16 1000G__—rs180743 0.789 APOBR P428A 0.1 0.5 27 benign Unknown Not scored 
1s3888190 16 1000G__—rs7498665 1 SH2B1 T484A 1 3.1 58 benign Tolerated 0.25 
rs16951275 15 1000G_srs7170185 1 LBXCOR1 W200R - - - - - - 
rs1808579 18 1000G__—_rs1805082 0.935 NPC1 I858V 1 6.1 29 benign Tolerated 0.24 
1s1808579 18 1000G__—_rs1805081 0.905 NPC1 H215R 0 -1.1 29 benign Tolerated 0.59 
1s2287019 19 1000G___rs1800437 0.714 GIPR E354Q 1 3.1 29 probably damaging Tolerated 0.09 


r is the LD between the BMI index SNP and the putative coding variant. 
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Shearing-induced asymmetry in 


entorhinal grid cells 


Tor Stensola'*, Hanne Stensola*, May-Britt Moser! & Edvard I. Moser! 


Grid cells are neurons with periodic spatial receptive fields (grids) that tile two-dimensional space in a hexagonal 
pattern. To provide useful information about location, grids must be stably anchored to an external reference frame. 
The mechanisms underlying this anchoring process have remained elusive. Here we show in differently sized familiar 
square enclosures that the axes of the grids are offset from the walls by an angle that minimizes symmetry with the 
borders of the environment. This rotational offset is invariably accompanied by an elliptic distortion of the grid pattern. 
Reversing the ellipticity analytically by a shearing transformation removes the angular offset. This, together with the 
near-absence of rotation in novel environments, suggests that the rotation emerges through non-coaxial strain as a 
function of experience. The systematic relationship between rotation and distortion of the grid pattern points to shear 
forces arising from anchoring to specific geometric reference points as key elements of the mechanism for alignment of 


grid patterns to the external world. 


Grid cells of the mammalian medial entorhinal cortex (MEC) are part 
of a neural metric for self-localization that is independent of contextual 
details’°. Grid patterns are thought to arise not by direct extraction of 
features from sensory inputs, but by local network dynamics within the 
entorhinal cortex itself*'!. The coherent response of ensembles of grid 
cells following experimental interventions*"”’, and the modular orga- 
nization of the grid cells'*"*, point to a neural architecture in which grid 
patterns emerge by competitive interactions between interconnected 
groups of neurons and move across the network in accordance with 
the animal’s trajectory in the outside world*’. 

For grid cells to be effective as an internal representation of the ani- 
mal’s location, the grid pattern must be anchored consistently to an 
external reference frame provided by stationary sensory cues*'*’. Grid 
patterns may be anchored at multiple locations and to multiple refer- 
ence frames, as suggested by the observation that the grid pattern frag- 
ments into a mosaic of local sub-grids in environments with multiple 
compartments". However, how the grid pattern is stabilized by specific 
features of the environment is poorly understood. To address the mech- 
anisms of the anchoring process, we determined the orientation and 
shape of the grid for a sample of 807 grid cells collected in familiar large 
square open-field environments’*. Grid orientation could be deter- 
mined accurately because 80% of these cells (469/587 cells) had 6 or 
more grid fields. Square recording environments were chosen because 
they contain axes of symmetry that are distinct from the symmetry 
axes of the grid pattern and so might be suitable for inferring solutions 
to the alignment problem. 


Grid orientation reflects boundaries 


The orientation of the grid axes did not distribute randomly across 
modules and animals. In each module of grid cells from each animal, 
the orientation of the grid axes clustered around the cardinal axes of 
the environment, defined by the north-south and east-west walls of the 
test box, confirming previous work showing that axes of the grid pat- 
tern tend to align with the walls in a square environment”. 

The majority of data were recorded in a 1.5 m wide square enclosure 
(7 data sets, 6 rats, 587 grid cells, 18 grid modules). In this environment, 


grid orientations clustered around the east-west axis and 60° multiples 
of this axis (Fig. la-c, mean + s.e.m. grid orientation averaged across 
the three main grid axes: — 1.91 + 0.24°). Clustering around these axes 
was observed for all grid modules in all animals (Fig. 1b, cand Extended 
Data Figs 2 and 3). A Rayleigh test confirmed that mean grid orienta- 
tion, averaged across grid axes, was highly non-uniform (Z = 405, P= 
2.2 X 10°, distribution multiplied by 6 to achieve 360°). 

However, although the orientation of the grid pattern distributed 
symmetrically around the east-west axis, closer inspection suggested 
that most grid cells were offset by a few degrees in either direction from 
the cardinal axis (Fig. 1c, d). To quantify the offset, we identified for 
each grid axis (Ax1-Ax3) in each cell the wall that had the smallest an- 
gular deviation from that axis (denoted Wy_n3). We then selected the 
minimal value Ain among the three angles (Extended Data Fig. 1a). 
The distribution of A,in across the cell population had a sharp unim- 
odal peak centred at 7.40° (mean absolute value + s.e.m.: 7.15 + 0.15°; 
Fig. 1d), not far off from the 7.5° offset that would have minimized sym- 
metry or overlap with the cardinal and diagonal axes of the environment 
(Fig. le, f). For the vast majority of grid cells in the 1.5 m environment 
(575/587 cells), Amin Was defined by the east-west axis (average positive 
offset: 7.48 + 0.17°, n = 362; negative offset: 6.11 + 0.22°,n = 213). Very 
few data points distributed near 0° and 15°, the two orientations that 
overlapped with the symmetry axes of the box (Fig. le, f; Extended Data 
Fig. 1b). The distribution of Ayn was highly non-uniform (Rayleigh test, 
Z= 62.5,P=14X10 *%). 

Amin was similar between cells that belonged to the same grid mod- 
ule, but the direction often differed across modules (Extended Data 
Fig. 4). Positive and negative offsets distributed across animals with 
nearly equal probability (7/18 modules had positive offsets). In animals 
with data from = 2 modules, the offset directions appeared to be drawn 
independently (35% of module pairs within animals had different off- 
set directions; P = 0.26, binomial test), suggesting that grid orientation 
is determined within modules and not globally as a function of behaviour. 

The rotational offset differed between the three grid axes. For a per- 
fectly hexagonal grid, as a result of sixfold symmetry, a 7.5° offset in 
axis 1 should result in distributions around 22.5° from Wygp and 37.5° 
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Figure 1 | Grid alignment in a 1.5 m square 
environment. a, Colour-coded firing-rate maps 
(top) and spatial autocorrelograms (bottom) for 
representative grid cells from two animals in the 
1.5m square. Innermost hexagon of vertices is 
indicated. b, Polar scatterplot showing distribution 
90 of grid orientation and grid spacing for 587 
grid cells in the 1.5 m box. For each cell, the location 
of the 6 innermost fields in the spatial 
autocorrelogram is shown (6 dots per cell; distance 
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from Wy; for axis 2 and 3, respectively (Fig. 1g, h; Extended Data Fig. 1b). 
However, in the data, the greater the angle between a grid axis and the 
corresponding Wy, the smaller was the deviation from the nearest 60° 
multiple of the parallel solution (A,,;, = 0°) (Fig. 1h, Extended Data 
Fig. 1d-g). As a consequence of the distinct rotations, the hexagonal 
symmetry of the grid pattern was perturbed. In most cells the inner 6 
fields of the spatial autocorrelogram formed an ellipse rather than a cir- 
cle’? (Fig. 2a, b; Extended Data Fig. 5). A similar deformation of the grid 
pattern has been observed under other experimental circumstances’*”*"°. 


Grid rotation reflects shearing 


In continuum mechanics, shearing is a deformation where points ona 
plane are shifted along the shear axis with a displacement propor- 
tional to the distance from the shear axis’” (Fig. 2c). Distances between 
points that are perpendicular to the shear axis remain constant. The 
shear transform is described for two dimensions by: 


or, with algebraic notation: 
| x+yy 


flxy)= ie 


where y, is the shear parameter along the y axis, 2 the shear parameter 
along the x axis, and x and y are row vectors of initial coordinates of 
points in the plane. For simple shearing, only one of the shear para- 
meters has a non-zero value. When the points that are deformed by 
simple shearing lie on a circle, the circle becomes elliptical (with the 
resulting ellipse referred to as the strain ellipse). This deformation is 
accompanied by non-coaxial rotation of axes in this circle, proportional 
to the angular distance from the shear axis. Forces that act ona plane to 
produce simple shearing can be deduced from the ellipticity and ori- 
entation (the semi-major and semi-minor axes) of the strain ellipse. 
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Angular offset from nearest wall (deg) 


Offset from 60° multiple of parallel configuration (deg) 


for grid axis with smallest offset from box walls 
(axis 1, Ajin). Red line indicates orientation that 
would provide minimal symmetry (7.5°, Fig. le, f). 
e, Top, schematic showing expected offsets for 
individual grid axes of a perfectly hexagonal grid at 
different degrees of rotation. Bottom, overlap 
between symmetry axes of box and grid at these 
rotations. Red lines show common symmetry axes. 
f, Minimal distance between field locations in a 
hexagonal grid and in the same pattern reflected 
around the cardinal or diagonal axes of the box, 
estimated for grid orientations in 0.5° steps. 

g, Distribution of angular offsets from the nearest 
wall axis Wy for each grid axis. h, Distributions 
of angular offset from Wy or 60° multiples of 
this orientation. Peak angular offsets from nearest 
60° multiple are indicated. Offset from respective 
Wy in brackets. 
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The close link between deformation and rotation of the grid pattern 
led us to hypothesize that elliptification of the grid reflects shearing along 
either of the box axes and that, through non-coaxial rotation, such de- 
formations underlie the consistent angular offset of the grid axes (Fig. 2c). 
We tested these hypotheses by applying simple shear transformations 
to all grid patterns, with shear axes parallel to the box axes, in the re- 
verse direction of the transform that might have caused the rotation in 
the first place (Fig. 2d). For each cell, we determined the shear para- 
meter that reversed grid ellipticity to a minimum along either cardinal 
axis. 

The effects were direction-dependent. Shearing along the north-south 
axis, orthogonal to the alignment axis (east-west), reduced ellipticity 
from 1.17 + 0.004 to 1.06 + 0.009 (means = s.e.m.). The angular offset 
of the grid pattern was completely abolished by this transform (Fig. 2d, e). 
After shearing, orientations distributed unimodally around the east-west 
axis, with almost no offset (peak for A;,in at —0.9°, s.d.:6.6°; Amin before 
versus after shearing: Z = 10.7, P= 7.4 X 10 *”, Wilcoxon rank sum 
test). There was a strong correlation between the angular offset of the 
grid and the optimal shear parameter +, that is, the force used to min- 
imize ellipticity in the grid (Fig, 2f r = —0.60, P = 1.04 X 10 °*; versus 
absolute angular offset: r = 0.16, P = 5.34 X 10° *). Mere compression 
towards the diagonal could not reproduce the relationship between 
deformation and offset (Extended Data Fig. 6). Shearing in the east- 
west direction, parallel to the alignment axis, did not reduce the offset 
of the grid, despite similar attenuation of the ellipticity (Fig. 2e; Z = 
0.44, P = 0.66; ellipticity + s.e.m.: 1.06 + 0.009). Grid axes with a smal- 
ler deviation from the shear axis showed less rotation (Fig. 1g, h; Ex- 
tended Data Fig. 1d-g). These observations would all be expected ifthe 
grid axes exhibit shearing-induced non-coaxial rotation, proportional 
to their angular distance from the shear axis. 

Finally, we tested, in a different set of rats, whether shearing devel- 
oped with experience, as hypothesized. We found that when rats ex- 
plored a square box for the first time, the grid was only minimally 
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offset from the wall (Fig. 2g, h; Extended Data Fig. 7; 20 cells, median 
Amin Of 2.7°). Amin Was significantly larger when the same animals ex- 
plored a familiar square (median of 9.8°, 13 cells, all recorded also in 
the novel box, Wilcoxon rank sum test: Z = 2.12, P< 0.05), suggest- 
ing that when an animal encounters an environment for the first time, 
one of the grid axes is aligned to one of the walls and, over time, shear- 
ing causes the axes to rotate away in an orientation-dependent manner. 


Shear forces may operate interactively 

In large environments, the ability of the grid map to maintain accurate 
spatial representations may be jeopardised by the increased distance 
between salient geometric features. To address this possibility, we ana- 
lysed grid orientation and deformation in 220 cells (13 modules, 5 rats) 
from a 2.2 m wide square box, with a surface area more than twice that 
of the 1.5 m environment. In this environment, the cells clustered around 
both cardinal axes (Fig. 3a-c; Rayleigh test, Z = 18.9, P= 4.4 X 10°). 
As in the 1.5m box, Amin was not far off from 7.5° (Fig. 3d; mean of 
7.37°; Extended Data Figs 1h and 4). Only a few cells had offsets near 
0° or 15°. A small cell sample was also tested in a large circular envir- 
onment (2.0 m diameter; Extended Data Fig. 8). Here grid orientation 
was more distributed, as expected in the absence of unique reference 
orientations. 

We asked if grids in the 2.2 m box underwent shearing-induced de- 
formation in the same way as in the smaller environment, and if grid 
orientation offsets still could be eliminated by reverse shearing. Grid 
cells were as elliptical as in the smaller environment (mean + s.e.m. 
ellipticity: 1.17 + 0.007), but the distribution of ellipse orientations was 
less clustered (Fig. 3e, Extended Data Fig. 5d—g; Rayleigh test: Z = 2.97, 
P = 0.05), suggesting that, in the large box, the deformation might not be 
ascribed to shearing along a single cardinal axis. To test this possibility, 
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Figure 2 | Relationship between orientation and 


deformation of grid patterns. a, Autocorrelogram 
showing ellipse fit to inner hexagon of firing 
fields of an example cell, with semi-major and 
semi-minor axes indicated as « and f.. b, Polar 


scatterplot showing eccentricity and orientation of 


ellipse fits for each cell in a sample of 587 grid cells 
in the 1.5m environment (two dots per cell, as 
indicated with asterisks in a). c, Schematic showing 
deformation and non-coaxial rotation before and 


Grid orientation (deg) 


after simple shearing. Amplitude and direction 

of shear forces are indicated by arrows (bottom 
left). Bottom right, orientation of strain ellipse 
(orange line) and effect on grid orientation (purple 
axes). d, Schematic showing reversal of grid 
orientation in two example ellipses (modules M1 
and M4) following shearing in the indicated 
direction. Minimizing ellipticity by reverse north- 
south shearing is accompanied by re-alignment 
of grid axes. e, Left, spike maps (trajectories with 
spikes) and autocorrelograms showing grid 
orientation for an example cell (module M1) before 
and after reverse shearing along each cardinal 
axis. Aynin is indicated by red lines. Right, 
distribution of orientation for each grid axis 
following minimization of ellipticity by reverse 
shearing. Note disappearance of 7.5° orientation 
peaks (asterisks) only following north-south 
shearing. f, Scatterplot showing strong correlation 
between angular offset (Amin before shearing) 
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é trial familiar). Red lines indicate A,,in. h, Mean 


orientation offset (+ s.e.m.) for cells recorded in 
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we again sheared each grid pattern along either of the box axes. Separate 
analyses were performed for cells that aligned to the north-south and 
east-west axes. Simple single-axis shearing reduced grid ellipticity 
to a similar extent as in the smaller environment (from 1.17 + 0.007 
to 1.06 + 0.007) but the angular offset of the grid (Amin) was not fully 
eliminated, even when the shear axis was orthogonal to the alignment 
axis (mean offset + s.e.m. from 7.4 + 0.02° to 4.9 + 0.02°; Z = 6.4, 
P=1.6X10 1°, Wilcoxon rank sum test; Fig. 3f; Extended Data 
Fig. 9a, b). The offset was not reduced when the shear axis was parallel 
to the alignment axis (mean + s.e.m. after shearing: 8.6 + 0.02°). 

The fact that shearing only partly removed the angular offset raises 
the possibility that anchoring was maintained also by forces from the 
orthogonal wall. Indeed, the rotational offset of the 3 grid axes in the 
2.2 m box did not match predictions from single axis shearing but fit 
well with added shearing from the second wall (Fig. 3f, g, Extended Data 
Figs 5h and 9c-e). When shearing from 2 axes was applied simulta- 
neously across a range of shear parameter values, unique solutions could 
be found that minimized ellipticity to a much better extent than in the 
one-axis scenario, with complete reversal of the angular offset in cases 
where the grid was anchored to diagonally opposite corners (Fig. 3h; 
Extended Data Figs 7 and 10). 


Shear forces may operate locally 

Consistent with the idea that grids anchor locally, some grid modules 
had orientations in the 2.2 m box that changed gradually (Fig. 4a, c) or 
discretely (Fig. 4b, d) across the arena. In cells with heterogeneous an- 
choring patterns, the angular offsets of fragments of the grid, but not 
the global orientation, remained close to 7.5° (Fig. 4c, d). Simple com- 
binations of shear-like forces from particular walls could reproduce the 
graded and fragmented patterns in simulated grids, with orientation 
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Figure 3 | Grid alignment in a 2.2 m square 
environment. a, Rate maps and spatial 
autocorrelograms for representative grid cells from 
the 2.2m square. Symbols as in Fig. 1a. b, Polar 
scatterplot showing distribution of grid orientation 
and grid spacing for 220 grid cells in the 2.2 m box. 
c, Frequency distributions showing clustering 
around 60° multiples of the two cardinal axes. 

d, Distribution of Ajin, as in Fig. 1d. e, Polar 
scatterplot showing distribution of ellipse fits as in 
Fig. 2b. f, Distributions of grid orientations 
following minimization of ellipticity by shearing 
along either cardinal axis. Axis distributions as in 
Fig. 1g. Cells aligned to the east-west walls were 
first rotated 90°. g, Frequency distributions 
showing, for each grid axis, the angular offset from 
the nearest 60° multiple of the cardinal axes (as in 
Fig. 1h). Distribution means and s.d. are indicated. 
Red lines show the predicted offsets following 
simple shearing orthogonal to the alignment axis 
until Ajin is 7.5°. Note systematic deviations in 
axis 2 and axis 3, suggesting shearing from a second 
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offsets and ellipse orientations that matched observed values (Fig. 4c, d; 
Extended Data Fig. 9d, e). This suggests that the same forces operate on 
locally anchored grid segments as on grids with uniform global geometry. 

In order to evaluate the extent of geometric heterogeneity in the grid 
pattern following local alignment, we estimated deformation and ori- 
entation of grid cells in each quadrant of the 1.5 m and 2.2 m environ- 
ments (Fig. 4e). The difference in grid properties across quadrants was 
determined by calculating the product of all pairwise correlations between 
module-specific autocorrelograms made for each quadrant (Fig. 4e-h 
and Extended Data Fig. 9f). In the 1.5 m enclosure, this product was 
large (Fig. 4f; median + s.d.: 0.81 + 0.11, 9 modules, 3 animals), indi- 
cating consistent global alignment. In the 2.2 m enclosure, the product 
was significantly lower (Fig. 4g, h; median + s.d.: 0.60 + 0.21, 8 modules, 
2 animals; Wilcoxon rank sum test, rank sum: 105, P = 0.02), suggest- 
ing that grid patterns were less uniformly anchored. 

To determine if alignment to borders is distance-dependent, we 
finally asked whether deformation of the grid pattern is stronger near 
walls and corners than in central parts of the box. We divided the data 
from the 2.2 m box into 9 (3 X 3) sub-divisions (Fig. 4i). Grid orienta- 
tion and grid deformation were then estimated for each grid module 
with 6 or more detectable fields in each sub-division. The central bin of 
the 2.2 m box showed significantly higher grid scores across modules 
and animals than the corner bins (Fig. 4j; Kruskal-Wallis ANOVA: 
H = 6.6, df. = 2, P< 0.05, Tukey post-hoc test). The decrease in rota- 
tional symmetry at the corners was accompanied by a significant in- 
crease in ellipticity in these bins compared to the central bin (Fig. 4k; 
H=7.99, d.f. = 2, P< 0.05), suggesting that distortions increase with 
proximity to the corners. One particular corner showed a remarkably 
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low variation in the distribution of ellipse orientation (Fig. 41). For every 
single animal, this corner segment was where the animal was released 
into the box, raising the possibility that anchoring is determined by the 
animal’s initial experience. 


Discussion 

The findings provide evidence for a mechanism by which geometric 
features of the environment cause local rotation and deformation of the 
hexagonal firing patterns of grid cells. Previous work has suggested that 
grid cells are oriented with one axis more or less parallel to one of the 
walls in a square environment’>. Our findings confirm this observation 
but show that within the scatter of grid orientations there is a limited 
number of orientation solutions, each corresponding to an average off- 
set of ~7.5° in either direction from either wall of the environment. We 
show that the rotation of the grid axes is accompanied by elliptic de- 
formation of the grid pattern, raising the possibility that rotation and 
deformation are part of a single process. 

In the small environment, the orientation offset could be comple- 
tely and selectively abolished by non-coaxial rotation through a sim- 
ple shear transform that minimized distortion (ellipticity) of the grid 
pattern specifically along one of the wall axes, in the reverse direction 
of the rotation that developed as animals got familiar with the envi- 
ronment. In the large environment, the grid pattern aligned interac- 
tively and in a distance-dependent manner to several walls. However, 
minimization of ellipticity by a two-axis shearing transform abolished 
the offset. The findings suggest that the rotational offset of the grid 
pattern is a consequence of shearing forces determined by the sym- 
metry axes of the environment, and that, for a specific axis of the grid, 
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Figure 4 | Grids can have multiple alignment 
solutions. a, b, Spike and rate maps for two grid 
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(right) shearing to minimize ellipticity. Asterisk, 
offset is still present. e, Stack of spatial 
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the angular rotation depends directly on the angular deviation from 
the shear axis. 

How shearing is implemented in the entorhinal microcircuit has not 
yet been established. Shearing may begin when grid fields are anchored 
by associative plasticity to features of the boundaries, such as corners, 
during the animal’s first exposure to the environment and this initial 
anchoring is followed on subsequent sessions by shrinkage of the grid 
pattern’*"*. With gradual compression and maintained anchoring, the 
end result is a deformation of the grid, accompanied by rotation of those 
grid axes that have a strong offset from the shear axis. A related pos- 
sibility is that grid cells are subject to continuous repulsion from the 
boundaries of the environment’. The rotational offset of the grid speaks 
against a uniform repulsion, but repulsive forces from the walls may 
lead to shearing of the grid pattern if the grid is initially anchored to one 
or a few reference positions only. Possible mediators of such repulsive 
effects include entorhinal ‘border cells’, whose firing fields line up in 
parallel with the walls and the corners of the environment’””’. 

The stabilization of grid orientation at a similar solution of 7.5° in 
nearly all animals raises the possibility that this orientation confers a 
functional advantage. This particular rotation minimizes symmetry 
between the grid pattern and the geometry of the environment. In tan- 
dem with the deformation of the grid, the rotation may increase differ- 
ences between activity patterns at geometrically equivalent locations 
and thereby reduce the frequency of errors in self-localization, such as 
the confounding of diagonally opposite corners in rectangular boxes”. 
The observations are consistent with a broad literature pointing to en- 
vironmental symmetry axes as major determinants of firing locations 


j, but for grid ellipticity. *P < 0.05. 1, Distribution 
of ellipse orientation (grey lines) across 
subdivisions. Dashed black lines indicate circular 
mean orientations; blue areas show circular s.d. 


in place cells and grid cells'”!°”°’?™*, but take these insights further by 
introducing a mechanism by which forces from the borders align grids 
to the environment in a way that may optimize the uniqueness of geo- 
metrically comparable places. 


Online Content Methods, along with any additional Extended Data display items 
and Source Data, are available in the online version of the paper; references unique 
to these sections appear only in the online paper. 
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METHODS 


Source of data and definition of grid cells. The data are taken from two previous 
publications from our laboratory in which alignment of the grid pattern was not 
addressed”'”. The 807 cells recorded in familiar square boxes were obtained from 
ref. 12; the 23 cells recorded in novel square boxes or circular boxes were taken 
from ref. 2. Grid cells were defined as cells with rotational symmetry scores (‘grid 
score’) exceeding the 95th percentile ofa shuffled distribution, and modules of grid 
cells were identified by a k-means clustering procedure’’. Mean grid score ( + s.e.m.) 
was 0.96 + 0.009 (95th percentile chance level: 0.20). The data were recorded across 
a wide area of dorsomedial MEC and contained a range of grid scales, with grid 
spacing spanning from 35 to 172 cm (mean range of grid spacing per animal + s.e.m.: 
58.1 + 33.8.cm). 

Subjects. Neural activity was recorded in 23 male Long Evans rats that were 3-5 
months old (350-450 g) at the time of implantation (12 from ref. 12; 11 from ref. 2). 
In 5 of the rats from ref. 12, and in all rats from ref. 2, the tetrodes were implanted 
near-tangentially to the MEC surface. The remaining 7 rats had multisite implants 
of vertically oriented tetrodes that collectively covered a large part of the MEC. 
After surgery, the rats were housed individually in large Plexiglas cages (45 X 44 
X 30cm) ina humidity and temperature-controlled environment. They were kept 
ona 12hlight/12 h dark schedule. All testing occurred during the dark phase. The 
experiments were performed in accordance with the Norwegian Animal Welfare 
Act and the European Convention for the Protection of Vertebrate Animals used 
for Experimental and Other Scientific Purposes. The study contained no random- 
ization to experimental treatments and no blinding. Sample size (number of animals) 
was set a priori to 5 or more, considered as the minimum required for statistical 
power in parametric tests for the type of data used in the present study. 
Electrode implantation and surgery. Tetrodes were constructed from four twisted 
17 pm polyimide-coated platinum-iridium (90-10%) wires (California Fine Wire, 
CA). The electrode tips were plated with platinum to reduce electrode impedances 
to between 120-300 kQ at 1 kHz. 

Anaesthesia was induced by placing the animal in a closed glass box filled with 

isoflurane vapour. Following this, the animal received an i.p. injection of Equithe- 
sin (pentobarbital and chloral hydrate; 1.0 ml per 250 g body weight) or anaesthe- 
sia was induced by isoflurane (air flow was kept at 1.2 li per minute with 0.5-3% 
isoflurane as determined by physiological monitoring). For Equitesin anaesthesia, 
supplementary doses were given when breathing patterns changed and reflexes 
began to return (0.15 ml per 250 g). Local anaesthetic (Xylocain) was applied on the 
skin before making the incision. Holes were drilled on the dorsal skull, anterior to 
the transverse sinus, after which the rat was implanted with either two microdrives 
carrying a single bundle of 4 tetrodes each (one per hemisphere) or a ‘hyperdrive’ 
consisting of 14 independently movable tetrodes. Hyperdrive impants were always 
on the left side, whereas microdrives were implanted on both sides. Microdrive 
tetrodes were inserted at an angle of 20° relative to the bregma/lambda horizontal 
reference, with the tips pointing in the anterior direction. The tetrodes were in- 
serted 0.1 mm anterior to the transverse sinus edge. Hyperdrive tetrodes were im- 
planted vertically between 3.75 and 5.0 mm lateral to bregma, with the posterior 
border of the bundle located 0.1-0.2 mm anterior to the edge of the transverse sinus. 
Jewellers’ screws and dental cement were used to secure the drive to the skull. One 
or two screws in the skull were connected to the drive grounds. During the first 
days after the surgery, the animals received oral doses of the analgesic Metacam 
(Meloxicam, 0.1 mg per 300 g; Boehringer Ingelheim, Germany). 
Recording procedures. Over the course of ~1-3 weeks, tetrodes were lowered in 
steps of 300 um or less until large-amplitude theta-modulated activity appeared in 
the local field potential at a depth of about 2.0 mm or lower. In hyperdrive experi- 
ments, two of the tetrodes were used, respectively, to record a reference signal from 
white matter areas. The drive was connected to a multichannel, impedance match- 
ing, unity gain headstage. The output of the headstage was conducted via a coun- 
terbalanced lightweight multiwire cable to an Axona acquisition system (Axona Ltd, 
Herts, UK, for all tangentially implanted animals) or via a lightweight multiwire 
tether cable and through a slip-ring commutator to a Neuralynx data acquisition 
system (Neuralynx, Tucson, AZ; Neuralynx Cheetah 64, for all multisite implanted 
animals). Both cables allowed the animal to move freely within the available space. 
Unit activity was amplified by a factor of 3,000-5,000 and band-pass filtered from 
800 to 6,700 Hz (Axona) or from 600 to 6,000 Hz (Neuralynx). Spike waveforms 
above a threshold set by the experimenter (~55 LV) were time-stamped and dig- 
itized at 32 kHz for 1 ms. EEG signals were amplified by a factor of 1,000 and recorded 
continuously between 0 and 475 Hz at a sampling rate of 1,893 Hz or 2,034 Hz. Light- 
emitting diodes (LEDs) on the headstage were used to track the animal’s movements 
at a sampling rate of 25 Hz (Neuralynx) or 50 Hz (Axona). The rat rested ona towel 
in a large flowerpot on a pedestal while electrical activity was monitored. 

Data from the novel environment were recorded after the animals had been 
trained for 12 days or more in an accompanying familiar environment. Familiar 
trials lasted 10 min; novel trials lasted 10-30 min, until the novel space was fully 
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covered by the animal. Stable grid patterns could be seen from the beginning in the 
novel box (mean gridness + s.e.m. for the entire trial: 0.88 + 0.14; familiar box: 
0.91 + 0.09; 95th percentile of shuffled distribution: 0.20). 

Behavioural procedures. The rats were kept at ~90% of their free-feeding body 
weight and food deprived 12-18 h before each training or recording session. Dur- 
ing the 2-3 weeks between surgery and testing, the animals were trained to collect 
randomly scattered vanilla or chocolate biscuit crumbs in either of two sets of en- 
vironments. Rats from ref. 12 were trained in 1.5m X 1.5m or 2.2 m X 2.2 m square 
black 0.5 m high boxes with black floor mats and black walls. Rats in ref. 2 were 
trained in 1.0m X 1.0 m square boxes or 2.0 m wide circular boxes, all black and all 
with 0.5 m high walls. Abundant visual cues were available external to each record- 
ing box (Extended Data Fig. 1). The 100 cm wide squares were presented in both novel 
and familiar recording rooms, using an ABA design (familiar—novel—familiar). 
Different rooms were used for the 1.5 m and 2.2 m boxes. In all rooms, the boxes 
were placed at the back end of the room, with the recording system and experi- 
menter at the front end, near the door. In ref. 12, a straight separation wall (1.5m 
box) or a separation curtain with an opening (2.2 m box), was placed between the 
box and the experimenter. No separation was used in the 1.0 m square or in the cir- 
cle. Recordings in the largest box lasted 30 min or occasionally more; in the 1.5m 
box, the minimum duration was 15 min. Recordings in the 1.0 m box lasted 10 min 
or more; in the circle, trials were 20 min. Floor mats were always washed between 
trials. Before and between trials, the rat rested in the flowerpot on the pedestal next 
to the recording box. 

Spike sorting and cell classification. Spike sorting was performed offline using 
graphical cluster-cutting software (tint, Neil Burgess, for Axona data; MClust, 
A. D. Redish, for Neuralynx data). Clustering was performed manually in two- 
dimensional projections of the multidimensional parameter space (consisting of 
waveform amplitudes and waveform energies), using autocorrelation and cross- 
correlation functions as additional separation tools and separation criteria. To ensure 
cells were not included in more than one data set, cells were compared across suc- 
cessive recording days. If two cells with similar spike clusters on the same tetrode 
had indistinguishable grid fields on two successive sessions, only one of them (the 
cell associated with the best coverage and the best signal-to-noise ratio) was included 
in further analysis. This approach drastically reduced the data set in some animals 
as large numbers of grids were recorded on multiple occasions (in one animal, 165/ 
341 cells were discarded because of identified repeats). 

Grid modules were defined by a k-means clustering algorithm as reported pre- 
viously’. The analysis was based on multidimensional data from each cell, con- 
sisting of grid spacing and grid orientation, as well as the eccentricity of the grid 
pattern. The k value was determined as the number of peaks detected in each ani- 
mal’s kernel smoothed density estimate of log grid spacing and grid orientation. 
Rate maps. Position estimates were based on tracking of the LEDs on the head 
stage connected to the drive. All data were speed filtered; only epochs with instant- 
aneous running speeds of 5cms ' or more were included. 

To characterize firing fields, the position data were sorted into 3 cm X 3 cm bins. 
The path was smoothed with a 21-sample boxcar window filter (400 ms; 10 sam- 
ples on each side). Firing rate distributions were then determined by counting the 
number of spikes in each bin as well as the time spent per bin. Maps for number of 
spikes and time were smoothed individually using a boxcar average over the sur- 
rounding 5 X 5 bins. Weights were distributed as follows: 
box = [0.0025 0.0125 0.0200 0.0125 0.00255... 

0.0125 0.0625 0.1000 0.0625 0.0125;... 

0.0200 0.1000 0.1600 0.1000 0.0200}... 

0.0125 0.0625 0.1000 0.0625 0.0125;... 

0.0025 0.0125 0.0200 0.0125 0.0025] 

Identification of grid cells. The structure of the rate maps was evaluated for all 
cells with more than 100 spikes on the baseline session by calculating the spatial 
autocorrelation for each smoothed rate map’. The degree of spatial periodicity 
(‘gridness’ or ‘grid scores’) was determined by calculating the rotational symmetry 
of the cell’s spatial autocorrelogram, as described previously’’. A cell was defined as 
a grid cell ifits grid score exceeded a chance level determined by repeated shuffling 
of the experimental data (100 permutations per cell). For each permutation, the 
entire sequence of spikes fired by the cell was time-shifted along the animal’s path 
by a random interval between on one side 20s and on the other side 20s less than 
the length of the session, with the end of the session wrapped to the beginning. A 
rate map anda spatial autocorrelation map were then constructed, anda grid score 
was calculated for each permutation. If the grid score from the recorded data was 
larger than the 95th percentile of grid scores in the distribution from the shuffled 
data, the cell was defined as a grid cell. 

Definition of grid orientation. For fine-grained analysis of the geometric features 
of the grid cells, we analysed grid orientation in the following manner. From the 
spatial autocorrelograms, we defined individual fields as neighbouring bins above a 
correlation criterion (0-0.5, depending on scale). We then computed the centre of 
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mass for each of the six fields closest to the autocorrelogram centre field (‘the inner 
ring’). As spatial autocorrelograms are symmetric, we defined 3 axes for further 
analysis. We first determined which inner field centre in the inner ring was closest 
in absolute angular distance to the nearest wall and labelled this axis 1. The two 
nearest axes were labelled axis 2 (most positive) and axis 3 (most negative). 
Estimate of grid orientation and rotational offset. The minimal offset of any 
grid axes to any of the four walls in the box was determined as follows. From the 
matrix of grid orientation values across all cells, X (size: n X 3), we identified the 
absolute angle to the nearest wall (in degrees): 


A = 45 —abs ([X mod 90] — 45) 


We then sorted the grid axes according to A per cell and defined Aynin to be the 
smallest offset across axes per cell. By referring back to the original grid orienta- 
tion for the sorted axes values in A, we could map each cell’s anchoring (Amin) toa 
particular wall pair (original axis closest to 0°: east-west axis, closest to + 30: north- 
south axis). Rayleigh test for circular uniformity are referred to as ‘Rayleigh test’ in 
the main text and figure legends. 

Kernel smoothed density estimate of grid orientation. Kernel smoothed den- 
sity estimates of grid orientation were computed by considering 100 points equally 
spaced that cover the range of data in x. Given samples x,,...,x;, 


fle) = 29" Kelx—x) 


where K is the Gaussian kernel 


with width o. 

Ellipse fitting. We fitted ellipses to the grid pattern using a direct least squares 
fitting approach as described in ref. 12. For each cell, an ellipse was fitted to the x and 
y coordinates of the six innermost detected field centres of the autocorrelogram. 
Briefly, a general conic is represented by an implicit second order polynomial: 


F(A,X) = AX = ax’ + bxy + cy” + dx + ey + f=0 


where X= [x’ xy yxy l]"andA=[abcdef]’ 
The fitting of a general conic can be achieved by minimizing the sum of squared 
algebraic distances: 


Da(A)= 2 F(x)? 


For direct ellipse-specific fitting, the quadratic constraint 4ac—b’ = 1 was imposed 
and expressed in matrix form as A'CA = 1 as 
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after which the constrained ellipse fitting problem was reduced to minimizing E = 
DA? subject to the constraint 


A'CA=1 


To determine the relationship between ellipticity and grid rotation, we estimated 
grid distortion by defining the lengths of the semi-major and semi-minor axes as 0 
and 8, respectively. To quantify the amount of grid pattern distortion, we used two 
ellipse measures. From the ellipse fit, we determined ellipse orientation or ellipse tilt 
(orientation of semi-major axis), ellipticity, and eccentricity. 

Ellipse eccentricity was calculated as: 


“0 


where « is the ellipse semi-major axis and 8 the semi-minor axis of the ellipse. 

Ellipse eccentricity ranges between 0 and 1 and describes how much a conic sec- 
tion deviates from perfect circularity (where 0 represents a perfect circle and 1 the 
limit at which the ellipse becomes parabolic). 


Wealso used a simpler measure with more intuitive properties in terms of trans- 
lating ellipse elongation to grid pattern distortion, defined as ellipticity 


BR 

where « is the semi-major and f the semi-minor ellipse axes. Ellipticity ranges from 
1 to © where 1 represents a perfect circle and © the limit at which the ellipse be- 
comes parabolic. Ellipticity describes elongation of the ellipse (length of semi-major 
axis compared to the semi-minor axis). Ellipse orientation was expressed as the 
angle (between —90 and 90 degrees) of the ellipse semi-major axis to zero. Grid 
cells had an average ellipticity of 1.17 + 0.004 and eccentricity of 0.48 + 0.005 
(mean + s.e.m.). 

Deformation by shearing. Shearing was performed, for each cell, along shear 
axes parallel to the two box cardinal axes. The grid pattern used in these analyses 
was defined by the 3 grid axes detected from the innermost hexagon of field cen- 
tres in the spatial autocorrelogram for each cell. The shear transform was defined 


for two dimensions by: 
1 y x 
roel IE 
v lIJLy 


where 7, is the shear parameter along the y axis and y along the x-axis and x and y 
are row vectors of initial coordinates of points in the plane. For each cell, and shear 
axis, shearing was performed across a range of shear parameters y (—0.55 to 0.55 in 
0.001 steps). We then calculated the grid ellipticity resulting from each transform. 
The shear that produced the smallest ellipticity value was defined as the optimal 
shear parameter. We then calculated updated grid axes from this optimal shear in 
each direction. 

Simulations. To simulate grid maps for square environments, a probability den- 
sity function was generated for each cell in which Gaussian modes were distributed 
according to an equilateral grid pattern. Each mode was described by a two- 
dimensional Gaussian function: 


tran 9-8) 


where A is the amplitude, xo and yo are the coordinates for the peak location, and 02 
and o;, the Gaussian variance in the two dimensions. We set a, = 07. o*was identical 


for each mode and determined by the inter-peak distance (inter-peak distance/5)*. 
Location of nodes were determined from: 


z= de” = (cos 0+isin 0) 


where zis complex such that peak locations in the x,y plane are given by x = real(z) 
and y = imag(z), and 


=] (a—0) tac 


c=b mod a 


where € is grid spacing, w grid orientation in radians, a the index for identity of the 
hexagonal ring of nodes and b the index for the identity of the node on that ring. To 
simulate alignment of the grid to a box axis, \y = 0 or 2/6. To shear the grid pattern, 
the shearing transform was applied to the real and imaginary components of points 
in z (above). 

Interactive shearing patterns. We simulated the effect of multiple interactive 
shear forces on grid geometry. Simulated grids were generated as above. Grid spac- 
ing, ¢, was set to 0.45, and box frame as [—1 —111—1;-111-—1~-1], defininga 
square box with sides = 2. Only fields that fell within this box were used. A simple 
shear transform along the y axis involves adding the x component multiplied by the 
shear parameter ‘y to the y component of every ordered pair in [x,y]. 


(x—1) 
(yv—1) +x 
The (x — 1) and (y — 1) terms moves the shear origin (anchoring point) to the [—1, 


—1) corner of the box. 
To simulate shearing from two orthogonal walls we set 


| 


(yl +72x 


flsy)=| 
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where y, was set to tan(71/24) as the original shear parameter such that an offset of 
7.5° was induced along one axis. We then systematically varied y. between 
[—0.35, 0.35] in 0.025 steps. Rate maps were generated using the two-dimensional 
kernel smoothed density estimator described above (Gaussian kernel width: 0.1) 
within the box frame (62,500 evenly spaced evaluation points) and spatial aut- 
correlograms were generated from these. We then determined grid features from 
these autocorrelograms as described above. 

To simulate symmetric shearing from two opposite (east-west) walls, we set 


x(1+y) 
y(.+yx) 


fey)=| 


and calculated grid features from the resulting maps. To generate grids that were 
fractured along the diagonal, we first generated a sheared grid according to: 


(x—1) 
(y-I +x 


We then discarded all the points x + y <0 (below the diagonal of the box), and 
added a copy of the remaining points mirrored around that diagonal 


fy)=| 
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where x, and y, represent the sheared points from the original matrix [x,y], whose 
sum was >0. 

Local anchoring. To generate subcompartment maps, the firing rate map of each 
grid cell was divided into 4 or 9 equal quadrants. This analysis was done on 5 ani- 
mals (3 from the 1.5 m box and 2 from the 2.2 m box; 17 modules in total) in which 
4 modules were recorded. Then autocorrelograms for these were made, and grid 
features determined according to methods described above. Only modules that reli- 
ably yielded 6 detectable fields were used for analyses. 

Histology and reconstruction of tetrode placement. Histology and reconstruc- 
tion of electrode positions was performed as described previously, using flatmaps 
for two-dimensional location of electrode locations within the MEC and adjacent 
parasubiculum. 

Statistical tests. Statistical tests were two-sided unless otherwise specified. No sta- 
tistical methods were used to predetermine sample size. 

Code availability. Code for shearing transformations can be obtained from the 
authors. 
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Extended Data Figure 1 | Alignment of individual grid axes to box 
geometry. a, Interpretation of alignment configurations. Bimodal distributions 
(red curves) around the two cardinal axes (bottom) can be mapped as unique 
alignment solutions to the cardinal axes x and y of a square box (top row). 
Orange triangles show how the basic unit of the grid pattern can be uniquely 
aligned to either cardinal axis (thick solid black lines). b, Distribution of angular 
offset for all 3 grid axes of all cells recorded in the 1.5m box. Amount of data 
are shown for the two symmetric alignment configurations 0° and 15° 
(symmetry between box geometry and grid) and for the offset where overlap 
between box and grid axes is minimized (7.5°). For each cell, grid axes were 
sorted according to angular distance from the nearest wall (labelled axis 1-3 
with increasing wall-distance (top inset)). The absolute offset from the nearest 
multiple of 60° was calculated for each axis as a measure of the deviation from a 
perfectly hexagonal grid parallel to the east-west axis (bottom inset). Frequency 
plots show these distributions with 3° wide bins (x axes). The offset for grid 
axis 1 (Ayyin) was consistently larger than for axes 2 and 3. For axis 1, 33.6% of 
the absolute values for A;,in fell within 7.5 + 1.5°, 5.8% within 0-1.5°, and 
only 2.9% within 15 + 1.5°. c, Schematic illustrating that the farthest away any 
axis of a perfect grid can be from any wall in the environment is 15° because 
of the symmetries inherent in the grid pattern and the square box. Because 
the offset distribution is constrained to [0, 15]°, tests against uniformity 

were performed by multiplying the offsets by 24 to achieve a 360° range. 

d, Frequency plot of grid orientation for axes 1-3 for all cells in the 1.5m 
environment after correcting for alignment direction (all cells with Amin <0 
were multiplied by —1 across the alignment axis (reflected)). Distance to Wy is 


ARTICLE 


indicated by red lines for each axis. e, Frequency plot showing distribution 
width, or variance, for grid axes as a function of distance from Wy in the 1.5m 
environment. Distributions of the 3 grid axes were sorted by distance to Wn 
(as in b) and are shown in distinct colours (as in Fig. 1g). f, Polar scatterplot 
(same data as in Fig. 1b) showing axis-specific offset from Wy or 60° multiple in 
the 1.5m box. All grids with A,,in < 0 have been reflected around the Ajin 
wall axis in order to re-align axes to one absolute offset solution. Colours as in 
Fig. 1g. Orange dashed lines show peak angular offset from 60° multiples 
(black lines) of parallel wall alignment (0°) for each axis. Orange wedges 
highlight the angle between each axis and its nearest 60° multiple of 0°. 

g, Relationship between expected angular offset from Wy for a perfectly 
hexagonal grid with 7.5° angular offset on axis 1, and observed offset from 60° 
multiple for each grid axis in the 1.5m box (mean of all cells). Least square 
regression line is indicated. Note that the observed offset decreases as the offset 
from Wy approaches its maximum at the diagonal (45°). h, Frequency plots 
showing only slightly larger angular offset along axis 1 (the axis nearest Wy) 
than axes 2 and 3 in the 2.2m environment. For axis 1, a total of 7.7% of the 
cells from the 2.2 m box had minimal angular offsets within 0-1.5° of the 
nearest wall, 26.8% had offsets of 7.5 + 1.5°, and only 6.8% were rotated 

15 + 1.5°. In 203/220 cells, grid orientation could be referred back to either of 
3 distinct anchoring solutions. The mean offsets from the walls were similar, 
all centering around 7.5° (mean + s.e.m. Aypin: north-south wall negative offset: 
(n = 108), 7.54 + 0.03°; positive offset (n = 58), 6.35 + 0.07°; east-west 

wall negative offset (n = 37), 8.16 + 0.12°). Offsets were largely unimodal in the 
2.2 m box (Fig. 3c), in contrast to the 1.5m environment (Fig. 1c). 
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Extended Data Figure 2 | Rate map and spatial autocorrelation diagram 
for representative cells from all animals. The 587 cells from the 1.5m 
environment were ranked according to time of recording for each animal and 
every tenth cell on the list was then selected. Cells are sorted by animals. For 
each cell, the rate map is shown at the top and the autocorrelogram at the 
bottom. Colour scale, 0 Hz to peak rate for rate maps; [—1,1] for 
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autocorrelation maps. Peak rate is indicated above the rate map and gridness 
below the autocorrelogram. Correlation is indicated by scale bar. Peak rate is 
indicated at the bottom of each rate map; angular offset at the top right of each 
autocorrelogram. Grid orientation and grid ellipticity were determined from 
the innermost hexagon of vertices in the autocorrelogram, as in Fig. la. 
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Extended Data Figure 3 | Distribution of grid orientation (A,,in) across 
individual animals in the 1.5m environment. a, Frequency distributions 
showing clustering of grid orientation around each of the 3 axes defined by 60° 


multiples of the east-west box axis (number of cells as a function of 


orientation). Each row shows one animal; animal number is indicated at the top 


axis. For each cell, the location of the 6 innermost fields in the spatial 
autocorrelogram is shown, as in Fig. 1a, b. The cardinal axes of the recording 
environment are shown as black lines. Note similar offset from the east-west 


axis in at least 6 out of 7 data sets (all animals except 14146). The deviation 


from the parallel configuration (0°) was consistent across animals. Excluding 


left. Rat 13960 had two distinct data sets (1 and 2, respectively). b, Polar 
scatterplots showing distribution of grid orientation and grid spacing for the 
entire sample of grid cells in each animal. Grid spacing is indicated on the radial 


cells with fewer than 6 grid fields did not cause any major change of the 
angular offset (mean + s.e.m. of remaining cells: 7.5 + 3.3°). 
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Extended Data Figure 4 | Modular organization of grid alignment. 

a-c, Data from 1.5m environment. a, Polar scatterplot of all cells recorded in 
the 1.5 m box (grey), as in Fig. 1b. Superimposed are all points from one animal 
(orange points with multiplicative colour). Green ellipses indicate 3 different 
modules. Different modules assumed either of the two alignment 
configurations around the east-west box axis, that is, distributions for 
individual modules were always unimodal. b, Kernel smoothed density curve 
showing frequency of observations across values of Amin (all cells; kernel width: 
2.5°). Coloured lines show means of individual grid modules. Colours 
correspond to different animals. Module means showed the same distribution 
trend as the pooled data. The average of the smallest angular offset (Amin) 
with grid modules as the unit of analysis was similar to the average of the pooled 
data (mean + s.e.m.: 6.3 + 0.2°). Both within and across animals, the 
distribution was bimodal. c, To visualize cross-animal clustering of grid 
orientation, for each grid module we counted the number of other modules 
with mean grid orientations that distributed within a narrow range of the mean 
orientation of the reference module (+ 2.5°). Left plot shows for each module 
(in individual rows) which other module means fell within + 2.5°. Modules 
were sorted according to mean grid orientation and assigned a rank value 
(module ID). Each module has a unique colour based on module ID (colour- 
bar). The grid orientation of a module was typically shared by several other 
modules (4.2 + 1.9 module means within + 2.5°), as indicated by many long 
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rows as well as consistent clustering of colour across rows). Bottom plot shows 
cumulative distribution of the same data. Taken together, the data suggest that 
the number of anchoring solutions is limited and that solutions are reused 
across modules. d-f, Data from 2.2 m environment. d, Polar scatterplot with an 
individual animal highlighted in purple. e, Distribution of mean values for 
grid orientation across modules, as in b. The average minimal orientation offset 
across module means in the 2.2 m box was 7.2 + 0.3° (mean = s.e.m.). 
f, Clustering of mean grid orientation across modules. Notation as in c. 
g, h, Distortion and rotation of grid pattern are independent of grid spacing. 
g, Scatterplots showing relationship between module identity and grid offset 
(left), optimal shearing parameter (middle), and grid ellipticity (right). 
Each circle corresponds to one cell. Dark red lines show means = s.d. for 
successive modules based on pooled data of individual cells. Blue lines show 
means + s.d. for mean values of each module (slightly offset in x direction 
for clarity). Data from 1.5 m and 2.2 m environments are combined. Note 
lack of effect of module identity on angular offset, shearing parameter or 
ellipticity. h, Same plots as in g, but binned by grid spacing. Spacing range was 
partitioned into 4 equal parts. The values underneath each panel show the 
centre value of each partition. The absolute offset decreased only minimally 
with grid spacing (repeated-measures Kruskal-Wallis test with 4 bins of 
grid spacing as the within-subjects factor: H = 0.8, d.f. = 3, P = 0.85; mean 
offset for modules M1-M2: 7.3°; mean offset for M3-M4: 6.9°). 
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Extended Data Figure 5 | Grid deformation is constrained by 
environmental geometry. a-c, Data from 1.5m environment. a, Frequency 
plot showing orientation (tilt) of the semi-major axis of an ellipse fit to the inner 
hexagon of firing vertices in each cell’s spatial autocorrelogram (all 587 cells 
that were recorded in the 1.5 m box). For clarity, ellipse values are shown over 
the entire 360° range (each cell thus contributes 2 data points as unique ellipse 
orientation values distribute within a 180° range). The superimposed curve 
shows a kernel smoothed density estimate of the same data (Gaussian kernel 
width: 7.5°). The distribution of ellipse orientation was highly non-uniform 
both within and between animals (Rayleigh test in the entire cell sample: 
Z=12.8,P=2.5X10 ° for (ellipse orientation distribution) X 2). Two 
unique modes within [—90, 90]° were identifiable. Modes were defined as all 
the values that fell between the two neighbouring troughs (red lines that mark 
the boundary of each mode-distribution, labelled 1 and 2, respectively). Cells 
within modules shared ellipse orientation. On average, for one mode, the long 
axis of the ellipse was offset by 27.7 + 1.1° from the east-west axis of the 
recording box; for the other mode, the offset was 17.9 + 1.4° from the north- 
south axis (means =~ s.e.m.). To determine if the spread of ellipse orientation 
was smaller than expected by chance, we drew, for each module, m random 
ellipse orientation values from the pool of all cells (n = 587), and determined 
the circular spread (standard deviation) of that distribution. m was the number 
of cells in the module. The procedure was repeated 300 times per module to 
allow calculation of z scores. The circular spread of ellipse orientation within 
modules was significantly lower than expected by chance (mean + s.e.m. 

z= 4.53 + 1.33; t(17) = 3.37 P = 0.006, one sample t-test). b, Frequency of the 
same data as in a, but modulo 90° to control for differential ellipse alignment 
(modulo 90° brings out trends common to each of the 4 walls in the box). 

A single peak was detectable at 65.4° (red line). c, Minimal wall offset (Amin) asa 
function of ellipse orientation for each cell. Values are shown for an ellipse 
range of 360°, such that each cell contributes 2 points. Each mode of the 
distribution of ellipse orientations consisted primarily of cells from one of the 
two configurations for grid orientation (positive or negative offset). 

d-g, Comparison of deformation in the 1.5 and 2.2 m boxes. d, Circular 
histograms showing ellipse orientation for all cells recorded in the 1.5m box 
(left) and the 2.2 m box (right). Ellipse orientations were multiplied by 2 to 
achieve a 360° range. Although there was clear clustering around two modes in 
the smaller box, no clustering was apparent in the data from the large box. 


e, Same as in d, but for ellipse orientation modulo 90° and multiplied by 4 to 
achieve a 360° range (left, 1.5 m box; right, 2.2m box). The modulo operation 
brings out distribution trends that are similar for the 4 walls in the box. No such 
trends were apparent for the 2.2 m environment. Mean vector length for 

the distributions is shown in the centre of each plot (red solid line and axis). 
f, Kernel smoothed density curves from ellipse orientation in all cells recorded 
in the 1.5 m box (left) and the 2.2 m box (right). We identified all peaks (orange 
lines) within [—90, 90]° (light areas), and then for each peak calculated the 
log ratio between the peak and the smaller of its two neighbouring troughs 
(green lines). While only 2 peaks were identified for ellipse orientation in 

the 1.5 m box (a), 5 peaks were present in the 2.2 m box. g, Log peak-trough 
ratios from the values calculated in f (means + s.e.m.). Values were significantly 
lower in the 2.2 m environment than the 1.5m environment (two sample 
t-test, (5) = 3.51, P = 0.02). h, Scatterplot showing, for all cells in the 2.2m 
environment, grid orientation from the grid axis second furthest away from the 
nearest wall (Wy) versus absolute ellipse orientation. There was a strong 
linear correlation between these parameters (r = 0.41, P = 3.82 X 10 7°), as 
expected if interactions from a second shear axis had occurred (linear 
regression is shown as red line). i,j, Distribution of angular offset after shearing 
along each of the cardinal axes of the 1.5 m box. i, Frequency plots showing 
minimal absolute offset from multiples of 60° of the parallel (0°) configuration 
for all cells before shearing (left) and after shearing in the north-south (y) 
and east-west (x) directions (middle and right, respectively; insets). For each 
box axis, we used the shear transform that minimized ellipticity. Before 
shearing, the distribution showed a clear offset (left). After shearing in the y 
direction (north-south; middle panel), the angular offset was eliminated. 
Shearing along the x axis (east-west) did not reduce the offset (right), despite 
similar reduction of the ellipticity. Note that, in simulated grids, the shear 
parameter y needed to rotate a perfectly hexagonal grid from 0° to 7.5° 

(y = tan(z/24) induced ellipticity of 1.16, a value almost identical to the mean 
ellipticity observed in the data (1.17). j, Kernel smoothed density estimates 

of Ayin (minimal offset from any wall) for all cells (Gaussian kernel width: 2.5°) 
after shearing along both the x and y axis (left and right panel, respectively). 
There were two distinct modes after shearing in the x direction. The 
distribution for the y direction, in contrast, became unimodal and centred 

on the east-west box axis (0°), suggesting that the bimodal distribution of 
angular offset was caused by shearing in this direction. 
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Extended Data Figure 6 | Observed grid deformation and rotational offset 
cannot be reproduced by mere diagonal compression, where forces from the 
corners move fields perpendicularly towards the diagonal. a, Diagonal 
compression model showing forces acting on a shrinking pattern that is 
anchored to opposite corners. Top left, square lattice (no elongation) 
representing starting point in novel environment. Top right, vector field (red 
arrows) of proposed forces resulting from pattern shrinkage given anchoring to 
opposite corners (red dots). Without elongation in novelty, pattern shrinkage 
during familiarization to the environment is hypothesized to be radially 
uniform. However, corner anchoring will stop shrinkage along the diagonal of 
the anchoring corners. This results in a compression of the pattern along the 
opposite diagonal. Bottom left, a simulated grid pattern that was uniformly 
enlarged (as indicated by the perfect circle fit) to mimic a novelty response. 
Upon familiarization to the environment (shrinkage, bottom right), and in the 
presence of anchoring to opposite corners, the pattern undergoes diagonal 
compression. The resulting pattern has an angular offset (here 7.5°) and is 
accompanied by pattern deformation. Force vectors are shown as red arrows 
inside the box, and component vectors are shown as red arrows outside the box. 
The diagonal compression transform is described by: 
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b, Comparison of grid pattern characteristics after shearing and after diagonal 
compression, both aimed at achieving a 7.5° angular offset. Black lines and dots 
show the original symmetric and aligned grid pattern. Red dots and ellipse 
denote pattern deformation imparted by shearing, while blue dots and ellipse 
show deformation caused by diagonal compression. 7.5° offset is indicated by a 
dashed black line. Note the pronounced elliptical deformation of the diagonally 
compressed pattern compared to the sheared pattern. c, Histogram of ellipticity 
values observed in 1.5 m and 2.2 m recording environments. The median is 
indicated by a black line. Red and blue lines show ellipticity after 7.5° offset 
induced by shearing and diagonal compression, respectively. The ellipticity 
induced by shearing matches the data almost perfectly, while the value for 
diagonal compression is much higher than observed. Together, these analyses 
suggest that although diagonal compression can induce both angular offset and 
elliptic deformation, the relationship between these variables does not render 
this transformation a likely candidate for explaining the observed grid 
alignment. In contrast, the relationship between ellipticity and offset following 
shearing closely matches observation (Extended Data Fig. 7), suggesting that 
this is indeed the process that underlies the alignment process. 
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Extended Data Figure 7 | Experience-dependent changes in grid patterns 
can be explained by shearing of grids that are anchored to diagonally 
opposite corners. To address effects of experience, grid orientation was 
compared at two stages of acquaintance with the environment, during the first 
trial of exploration in a novel square box in a novel test room and during 
exploration of a similar but highly familiar box in a familiar room. We used a 
previously published set of 20 grid cells from 5 rats that ran in 1 m wide square 
boxes with shape and colour similar to the 1.5 and 2.2 m boxes in the main 
study’. 85% of the 20 grid cells had 6 fields or more. a, Rate maps for a 
representative grid cell recorded in a 1 m wide square box in a familiar room 
(left), a novel room (middle), and a second time in the familiar room (right). 
b, Cumulative distribution frequency plot showing distribution of orientation 
offsets in novel and familiar rooms for cells that were recorded in both 
environments. c, d, Ratio of grid spacing (c) and grid ellipticity (d) in the novel 
and the familiar environment (novel/familiar). Red line shows mean 

(+ s.e.m.). *P < 0.05. Grid spacing and grid ellipticity increased from familiar 
to novel by factors of 1.06 + 0.03 and 1.08 + 0.03, respectively (one-sample 
t-test of log ratios: (12) = 2.25, P< 0.05 and t(12) = 2.30, P< 0.05; all cells 
recorded in both environments), consistent with previous observations of grid 
cells in novel environments". e, Distribution of ellipse orientation in familiar 
and novel environments (left and right, respectively). Orientation is expressed 
in relation to the east-west wall (inset). Ellipse orientation was more sharply 
distributed in the novel environment, with a circular mean of 81.1° and s.d. of 
16.7°, near orthogonal to the anchoring wall. In the familiar environment, the 
distribution was broader (circular s.d. of 37.4°), suggesting that the impact of 
the walls is relaxed with repeated experience. f, Analyses of simulated grid 
patterns showing that both de-elliptification and non-coaxial rotation can be 
induced also with elongated grid patterns as the starting point, if the grid is 
anchored to opposite corners of the recording box. Top left, square lattice 
parallel to the east-west wall axis, elongated along the north-south wall axis. An 
ellipse is fitted to indicate degree of elongation (red). Top right, same lattice but 
allowed to relax towards a less elongated state. If the grid is anchored to 
opposing corners (red dots), a compression shear (corner shear) vector 
displacement field is generated (red arrows). Bottom left, simulated grid 
pattern, aligned and elongated in the same way as the lattice above. The 
elongation produces an ellipse (red) oriented along the north-south axis. 

No rotation is present at this initial stage. Bottom right, relaxation towards a 
less deformed state, with the grid anchored to opposing corners, produces 

a sheared pattern with an angular offset. Shearing force vectors are shown 

as red arrows in each of the non-anchoring corners. The transform is 
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depending on which corner pairs operate as anchors. So long as only one y is 
non-zero, the transform produces shear-like displacement along one of the 
cardinal axes, but the shear axis is rotated 45° so as to leave the two anchor 
corners in place. Corner shearing minimized ellipticity and removed the 
angular offset of the pattern (see i). g, h, Analysis of relationship between 
ellipticity, angular offset and elongation level in novel environments. 

g, Optimal corner-shearing for minimizing ellipticity in an elongated grid. 
Ellipticity values are brightness-coded. Changes in grid orientation may 

take place after minimization of the elongation of the grid when the grid is 
anchored to opposite corners. For such changes to occur, shearing should 
minimize ellipticity given the initial elongated state. To test this idea, we 
systematically elongated simulated grid patterns along the wall axis 
perpendicular to the grid alignment axis (y axis in plot). For each elongation 
level, we next corner-sheared this pattern (with corners as anchoring points), 
through a range of shear parameters (x axis in plot, values correspond to 

the offsets associated with each shear parameter [offset = tan”! (shear 
parameter)]. For all elongation factors >1, shearing to a level >0 produced less 
ellipticity than in the original pattern. This was not the case for the standard 
shear transform in which no corner anchoring occurs (not shown). Across 
elongation levels, we determined which shearing parameter minimized 
ellipticity (blue line). We next found the intersection between these optima and 
the shear parameter that produces a 7.5° offset (red line). The elongation factor 
at this intersection was 1.28. Thus, in order for a 7.5° offset to represent the 
optimal amount of shearing in simulated data with anchoring to diagonally 
opposite corners, the initial elongation factor of the grid pattern is 1.28, 

very close to the observed ellipticity in the novel environment (panel h). 

h, Cumulative distribution function (black line) of ellipticity in grid cells 
recorded in a novel room. The optimal elongation factor is indicated in blue. 
Note the proximity of this factor to the median ellipticity level. i, Frequency 
histograms showing actual distribution of orientation for each grid axis (top), 
and distribution after corner-shearing using the transform in f (middle and 
bottom). Middle panel, shearing to one pair of anchoring corners; bottom, 
shearing to the other pair. After corner-shearing to the first pair, the offset was 
minimized to the same extent as in the simple shearing paradigm (middle, 
red asterisk). The offset could be abolished almost completely by corner 
shearing (peak symmetry offset after shearing 0°, kernel smoothed density 
curve with Gaussian kernel width 1.35°, 60% of the data distributed within 
0-5°). After corner shearing, however, the distribution of ellipticity displayed 
less variation than after simple shearing (s.d. of 0.0025 versus 0.009). 

j, Proposed model of deformation and rotation of grid patterns as a function of 
experience. From left to right, default minimum-energy state of grid, elongated 
grid in novel environment, sheared grid with rotation after experience, 

and reversal of sheared grid to default state by reverse corner-shearing analysis. 
We suggest that, in novel environments, grid cells may start out with an 
orientation that aligns one grid axis with a band along a wall defined by the 
activity of border cells. This initial alignment may disrupt the symmetry of 
the grid pattern. Through shearing, the grid may then be relaxed towards 

a lower-energy-state solution that is less dependent on the initial 

anchoring segments. 
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Extended Data Figure 8 | Distribution of offset and ellipticity in a circular 
environment. We have shown that, in square environments, the grid pattern is 
deformed and rotated by shear forces parallel to the walls of the environment. 
Here we show how grid orientation and grid ellipticity are distributed in a 
circular environment. We used a sample of 23 grid cells from 6 rats that had 
been recorded in a 2 m wide circular environment in a previous study’. 100% of 
the cells had 6 grid fields or more. a, Example grid rate maps (top) and 
associated spatial autocorrelograms (bottom). b, Histogram of grid orientation 
for all 3 grid axes and across all cells recorded in the circle. Grid orientation was 
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more variable than in the square environments. The distribution of mean 
grid orientation was not significantly different from uniform (Rayleigh test 
(mean grid orientation was multiplied by 6 to achieve 360° range); Z = 1.23, 
P= 0.30). The ellipticity of the grid pattern was significantly increased 
compared to the ellipticity in the square boxes (1.24 + 0.025 versus 

1.17 + 0.004; Z = —2.98 P = 0.003). This increase in ellipticity likely reflects 
grid pattern variability due to the absence of local geometric landmarks 
such as corners. 
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Extended Data Figure 9 | Effect of shearing on deformation and 
reorientation of grid patterns in 1.5 m and 2.2 m environments. 

a, b, Distribution of angular offset after shearing along each of the cardinal axes 
of the 2.2 m box. a, Kernel smoothed density estimates (Gaussian kernel width: 
2.5°) of grid orientation for all 3 grid axes for all cells recorded in the 2.2 m box. 
The top panel shows orientation after shearing along the axis orthogonal to 
the alignment axis Wy (solid black curve). Shearing was performed separately 
for cells aligned to the east-west and north-south axis of the box and then 
combined on the basis of which relationship the shear axis had to the alignment 
axis (orthogonal for top plot). Dashed grey lines show the original distribution 
before shearing. Red lines indicate the orthogonal box axes (0 + 30°, 0° 
corresponds to the east-west box axis, 30° to the north-south axis). Peaks 
detected in the sheared distribution around the red lines are reported above 
each peak. Note reduction—but not complete elimination—of angular offsets 
after shearing orthogonal to the alignment direction. Bottom, same as above but 
for shear axes parallel to the alignment axes. Shearing along this axis did not 
result in reduction of the offset. b, Kernel smoothed density curves (Gaussian 
kernel width: 1.35°) for absolute offsets from nearest multiples of 60° (referred 
to the parallel configuration) after shearing along both axes: orthogonal to 
alignment axis (top) and parallel to the alignment axis (bottom). Methods for 
shearing were the same as in a. Peak for each distribution is indicated by red 
solid lines. Axes were sorted per cell according to their angular distance from 
the nearest wall (axis 1 = Aynin;3 axis 3 had the largest angle from nearest wall). 
After shearing orthogonal to the alignment axis, the offset peak was reduced 
considerably. Shearing along the other axes did not reduce the offset. c-e, The 
effect of adding a second shear interaction. c, Effects of anchoring to multiple 
walls in a simulated grid. A square lattice is used for illustrative purposes. 
Top row, square lattice aligned to the cardinal axes with no shear interactions 
from any wall (left). Simulated rate map of grid aligned to the east-west box axis 
(middle). Spatial autocorrelogram generated from rate map (right). Ellipse 
and inner fields are shown in black. Middle row, the square lattice after a shear 
transform along the north-south axis (left). Black line with arrow illustrates 
shearing interaction from wall. Shear origin (anchoring point) is shown in red. 
Simulated rate map after shearing is shown in the middle, and the resulting 
autocorrelogram is shown to the right. Bottom row: square lattice (left), 
simulated rate map (middle) and resulting autocorrelogram (right) after 
sequential shearing forces from two wall axes. Note that adding the second 


shear force has no impact on the smallest angular offset (Ajin) but the 
orientation of axes 2 and 3 is changed and the ellipse orientation is altered 
accordingly. d, Grid axes and ellipses from autocorrelograms in c before (left) 
and after (right) the shearing that minimized both ellipticity and grid offset 
(illustrated with insets). The east-west axis and its 60° multiples are shown in 
grey. The axis nearest one of the walls is highlighted in red. Note the elimination 
of grid offset in the case with single shear interactions (middle), but inability of 
simple shearing to eliminate the offset (asterisk) when two shear interactions 
were operative (bottom). e, Effect of second shear interaction on grid 
orientation and deformation. We systematically applied shearing from a 
second shear axis to the simulated grid that was already sheared in one direction 
to induce a 7.5° offset. A range of shear factors was explored (—0.35 to 0.35). 
Left column, effect of second shear on grid orientation for individual grid axes. 
The middle panel shows the axis that was originally sheared to become offset by 
7.5°. Note minimal change in offset in this axis, while systematic changes 
occurred in the two remaining axes (top and bottom respectively), resulting in 
further elliptic deformation of the grid pattern. Top right, grid orientation 
averaged across the 3 axes. Bottom right, ellipticity of the grid pattern as a 
function of the second shear parameter. f, Quadrant spatial autocorrelograms 
for grid cells with large spacing yielding fewer than 3 grid fields per quadrant. 
Left column, simulated rate maps with one, two or three grid fields. Middle 
column, spatial autocorrleograms from the rate maps in the left column. As 
expected, a single field was not informative in terms of grid features such as grid 
orientation or spacing, whereas two fields yielded information about one 

axis and three fields were informative about all axes. Taken together across all 
cells from a module, such contributions may average to overcome the effect of 
sparse field-sampling and recapitulate the original local grid geometry. 

Right column, simulation based on 20 grid cells with different grid phase (but 
similar spacing and orientation) showing that when spatial autocorrelograms 
from multiple maps (from one module) with varying number of fields from 
various axes are combined (averaged), grid features from all axes can be 
retrieved and used to determine grid geometry in subdivisions of the original 
environment. White asterisks show field peaks detected by the algorithm. 
Orange lines show the grid orientation used to generate the simulated grid cells 
within a quadrant. The average autocorrelogram faithfully captures the original 
grid geometry even if the average number of fields per cell is less than three. 
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Extended Data Figure 10 | Two-axis corner-shearing removed the offset in 
the 2.2 m environment. Several observations point to shear-like forces from 
two perpendicular walls as more common in the 2.2 m environment than the 
1.5m environment (Fig. 3). Minimizing ellipticity by simple shearing 
completely removed the angular offset in the 1.5m box. In the 2.2 m box, 
however, the effect was only moderate, reflecting the fact that the grid was 
anchored to two walls, not one. Here we sought to determine if the two optimal 
shear parameters associated with two-axis corner-shearing were recoverable 
with the same approach as for simple shearing (minimization of ellipticity). 
Specifically, we tried to recover the original configuration of a simulated grid 
pattern on which we had applied two-axis corner-shearing in advance. The 
starting (default) grid pattern was made perfectly symmetrical and aligned to 
the east-west axis. We then applied 2-axis corner-shearing using the reverse 
transform proposed to occur as the result of grid shrinkage with corner 
anchoring during familiarization (corner-shearing): 


ly x 
f(xy) = 
Y2 l+y2.I Ly 
1 1 13 
We set y tan Z , and y, = —tan = a, Simulated grid pattern before 
V1 = 5 grid p 


(black) and after (red) two-axis shearing. The offset from the first shear axis is 
7.5° but the second shear axis introduces further rotation, combined with 
deformation of the two remaining axes. b, Ellipticity surface resulting from 
systematic exploration of the two-shear parameter dimensions. Height and 
colour indicate ellipticity. x and y axes denote the angular offset that the 
shearing parameters would cause in simple shearing (from —40 to 40°). Note 
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the prominent minimum corresponding to a unique solution to the problem of 
recovering the original pair of shear parameters (arrow). c, Same as in b, but 
as pure colour map. The parameter set that was used to shear the grid initially is 
shown with green lines. The point of minimum ellipticity is shown as a 

white circle. In this case, the retrieved parameters were exactly those that were 
used in the original transform of the pattern. The example illustrates that 

the original shear parameter set could be recovered completely by two-axis 
reverse shearing. d, Colour map of angular offset resulting from two-axis 
shearing across the same range of shear parameter values as in a. Black 
corresponds to wall alignment (0° offset). The point of minimal ellipticity 
(white circle in c) is the same as for minimal offset (white circle in d). 

e, Cumulative distribution function showing angular offset of the data recorded 
in the 2.2 m box before (black) and after (red) two-axis corner shearing 

(Fig. 3h). Note consistent shift to the left after shearing. 2-axis corner shearing 
significantly reduced the original offset (Wilcoxon rank sum test: Z = 6.6, 
P=44X 10 ''); the offset was abolished almost completely by corner 
shearing (peak symmetry offset after shearing 0°, kernel smoothed density 
curve with Gaussian kernel width 1.35°, 60% of the data distributed within 
0-5°), while normal 2-axis shearing had little effect (Fig. 3h). f, Scatterplot of 
optimal corner-shear parameter and original grid offset for all data in the 

2.2 m box. Shown is tan” ' of the shear parameter (in degrees) to illustrate offset 
that the parameter would yield in simple shearing). The two main scatter 
clusters correspond to the two distinct anchoring solutions in the 2.2m 
environment (that is, the two wall-axes). Red dashed lines show the wall axes of 
the environment ([—30, 0, 30]°). 
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The initial multiplicity of stellar systems is highly uncertain. A num- 
ber of mechanisms have been proposed to explain the origin of binary 
and multiple star systems, including core fragmentation, disk frag- 
mentation and stellar capture’*. Observations show that protostellar 
and pre-main-sequence multiplicity is higher than the multiplicity 
found in field stars*’, which suggests that dynamical interactions 
occur early, splitting up multiple systems and modifying the initial 
stellar separations*”. Without direct, high-resolution observations 
of forming systems, however, it is difficult to determine the true 
initial multiplicity and the dominant binary formation mechanism. 
Here we report observations of a wide-separation (greater than 
1,000 astronomical units) quadruple system composed of a young 
protostar and three gravitationally bound dense gas condensations. 
These condensations are the result of fragmentation of dense gas 
filaments, and each condensation is expected to form a star on a time- 
scale of 40,000 years. We determine that the closest pair will form a 
bound binary, while the quadruple stellar system itself is bound but 
unstable on timescales of 500,000 years (comparable to the lifetime of 
the embedded protostellar phase’’). These observations suggest that 
filament fragmentation on length scales of about 5,000 astronomical 
units offers a viable pathway to the formation of multiple systems. 

Barnard 5 (B5) is a dense core in the Perseus star-forming region (at 
a distance of 250 pc) that hosts at least one young, forming star’’. Imaging 
of B5 in the emission of the dense-gas-tracing NH3(1,1) line shows it to be 
an example ofa ‘coherent’ dense core”’, which is a contiguous high-density 
region with subsonic levels of turbulence’*. Higher-resolution imaging 
reveals narrow filamentary structure within the coherent core’*. We 
observed the NH;(1,1) and (2,2) lines using the Karl G. Jansky Very 
Large Array (VLA)"*, which reveals that the filaments in B5 are frag- 
menting and that they are in the process of forming a wide-separation 
multiple stellar system. 

Nearly half of all stars reside in multiple star systems*'*. Consequently, 
a host of phenomena, ranging from supernova rates to planet formation, 
depend on understanding stellar multiplicity’”. Because of the observa- 
tional challenges associated with observing early systems, the dominant 
ideas for binary formation are based on simulations and analytic argu- 
ments, which naturally require a variety of assumptions*’’. To date, 
observations have not captured the formation of a binary system at a 
stage where its origin is unambiguous, and prior observations of core 
substructure lack the spatial and kinematic resolution to be used in 
predicting whether observed structures would form protostars and/or 
produce a bound system’’”°. The observed kinematics and separation 
(>1,000 astronomical units, AU) of the B5 system is significant because 
it demonstrates a clear mechanism for wide binary formation and pro- 
vides convincing evidence that the observed condensations will become 
a bound multiple star system. 


Detailed knowledge of the underlying distribution of dense gas is the 
key to determining which structures will go on to form stars. Here we 
identify the dense gas structures that are most likely to form stars using 
the dendrogram technique”. Dendrogram analysis is a hierarchical struc- 
ture decomposition that uses isocontours to identify individual features, 
while also determining where these contours merge with adjacent struc- 
tures to create a new parental structure. We refer to the smallest scale (and 
brightest) structures in the dendrogram as condensations. These are 
the most likely places for an individual star to form. Figure 1a shows the 
B5 region as seen in dense gas (number density of H>, ny, 2 10* cm”), 
with the protostar and the identified gas condensations shown by a star 
and circles, respectively. The mass of the well known protostar B5-IRS1 
is 0.1 solar masses (Mgyy; ref. 22), while the masses of condensations 
B5-Cond1, B5-Cond2 and B5-Cond3 are 0.36 + 0.09 Mgyy, 0.26 + 0.12 
Mgun and 0.30 + 0.13 Msun respectively. Uncertainty in these masses is 
dominated by the uncertainty in the temperature used to convert mea- 
sured fluxes to masses. The radii of the three condensations are respec- 
tively 2,800 au, 2,300 au and 2,500 AU, while the projected separations 
between the same three condensations and the protostar are 3,300 Au, 
5,100 au and 11,400 Au (see Methods). The half-mass radii of the con- 
densations are about half the condensation radii. This, combined with 
the mass radius relations (Extended Data Fig. 2), suggests that the central 
regions will collapse faster than the whole condensations and before 
interactions between condensations can play a major role in the stars’ 
initial separations. Although these separations are large, they are con- 
sistent with initial protostellar pair separations predicted for core frag- 
mentation by numerical simulations’. In the simulations, protostellar 
separations evolve rapidly on timescales of 0.1 Myr, and some systems 
become unbound while others migrate to closer proximity. 

Projected proximity on the sky does not necessarily imply that objects 
are physically related. However, the line-of-sight velocities of the observed 
condensations are similar and the grouping is likely to be physically asso- 
ciated. The velocity dispersion, o,, of the dense gas provides another im- 
portant piece of information, the gas kinetic energy, which is needed to 
determine whether the condensations are transient structures or grav- 
itationally bound and likely to form a star. The velocities and velocity 
dispersions of the condensations are determined by fitting NH;(1,1) 
and (2,2) line profiles'*. The condensations and protostar display the 
same centroid velocity to within 0.2 kms’ * andare therefore associated 
with the same dense core. The level of turbulence in this region is so low, 
Oturb ~ 0.53-0.66 times the sound speed in the gas", that gravity will over- 
whelm the combined turbulent and thermal pressure in all the identified 
condensations, and a star will probably form in each case. The timescale 
for these condensations to undergo gravitational collapse is approximately 
the gas free-fall time, which we estimate to be 40,000 years (Methods). 
This timescale is sufficiently short to ensure that the system’s spatial 
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Figure 1 | High-angular-resolution image of dense gas and stellar 
progenitors. a, Background image from the JVLA of the Barnard 5 region shows 
the dense gas traced by NH;(1,1). It reveals two filaments, which together host 
three gravitationally bound condensations (B5-Cond1, 2, 3). Red contours and 
orange-filled circles show the condensation boundaries and centres, while the star 


configuration will remain nearly unchanged during collapse, even if the 
protostars move as far as possible (in straight lines at the gas sound speed) 
for the duration of the collapse (<1,700 Au or 6.8” at the distance of Per- 
seus). Figure 1b shows, as dotted circles, the possible ranges over which 
protostars could move in a free-fall time. The circles are smaller than the 
current condensation radii, so we conclude that this is a multiple system 
caught at the beginning of its formation. 

In order to determine if the multiple stellar system is bound, it is nec- 
essary to estimate the masses of the stars that will be formed within the 
condensations. However, there are two complicating factors that make 
mass estimates uncertain. First, we must estimate what fraction of 
the condensation mass will end up in the star. Comparison between 
the initial mass function of stars and the distribution of dense core 
masses suggests that individual cores have a star formation efficiency 
of EpenseCore = Mstar/MpenseCore ~ 30% (refs 23-25). Theoretical esti- 
mates based on the effects of protostellar outflows predict core effi- 
ciencies Of &penseCore = 25-75% (refs 26-28). Here, the condensations 
are embedded inside a previously identified dense core, B5, and they 
have radii one-tenth those of typical dense cores. Therefore, it would 
not be surprising if their star formation efficiency is close to 100%, at 
ECondensation — Mstar/Mcondensation = 75% (ref. 26). Second, since the 
condensations are embedded within dense filaments, it is possible that 
the final stellar masses will be higher than the current measured con- 
densation mass, because additional gas can flow along the filaments 
and accrete onto the condensations”””. If we adopt a very conservative 
estimate for the efficiency of 30%, and assume no additional mass accre- 
tion from the filaments, the final stellar masses formed from the con- 
densations will be above the brown dwarf limit (80 Jupiter masses). 
Further fragmentation could occur within the condensations, however, 
given the low multiplicity fraction at these masses, <26% (ref. 4), so for 
our analysis we assume the most likely outcome, single stars. The lower 
the final stellar masses, the less likely the system is to be bound, so we 
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indicates the protostar (B5-IRS1) location. b, Contour map showing the filaments 
in dense gas. Greyscale circles show the distance that could be covered during 
40,000 years while moving at the speed of sound, 0.2 km s |. Inaandb, filled red 
circle at bottom left shows the angular resolution of the observations, and scale bar 
is shown at bottom right. 


have made the most pessimistic assumptions possible (giving the lowest 
stellar masses) in evaluating boundedness. 

Depending on the final stellar masses and kinematics, the resulting 
multiple stellar system could either be strongly bound or quickly dis- 
solve owing to dynamical interactions. Given the dynamical instability 
of higher order systems, it is very likely that even if all the stars are initially 
bound, one or more will be dynamically ejected at a later time®*”. For the 
closest possible pair, B5-IRS1 and B5-Cond2, we calculate the energies 
for a wide range of final stellar masses and under different kinematical 
and spatial distribution assumptions. In each case, we make different as- 
sumptions to reconstruct the total velocity dispersion and total separa- 
tion based on our measured line-of-sight values. Figure 2 shows that the 
ratio of kinetic to gravitational energy is much less than one for all these 
cases, and therefore the pair is gravitationally bound. 

Similarly, we compare the kinetic and gravitational energies for the 
expected stellar system taken all together, and find that the quadruple 
system is bound. Although bound, it is not likely to be a stable hierarch- 
ical system in the long term (see Methods), and the system will probably 
dissipate into a wide-separation binary system (B5-IRS1 and B5-Cond2). 
An important caveat is that this analysis does not take into account the 
effect of gas. The system is embedded in a larger reservoir of gas (the B5 
core), which is several times the combined mass of the condensations. 
This additional gas can have two effects on the system evolution. First, 
the gravitational potential of the dense gas enhances the binding energy 
of the system by increasing the stellar escape velocity. Eventually, much 
of this gas will be removed by outflows. Second, the gas acts as a drag force 
on the stars, dissipating some of the stellar kinetic energy. Both these 
effects support the same outcome: a bound stellar system, at least during 
the formation stage. These results show that fragmentation of filaments 
would happen at scales smaller than those predicted by Jeans fragmen- 
tation of dense cores'*”, and therefore filaments (or substructure in 
general) might be crucial ingredients in the formation of multiples. 
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Figure 2 | Ratio of kinetic to gravitational energy for B5-IRS1 and 
B5-Cond2 as a function of final stellar mass. A binary system is bound if the 
kinetic-to-gravitational energy ratio is below unity. We calculate the energies 
for a wide range of efficiencies, &condensation = Mstar/Mcondensationw and under 
different kinematical and spatial distribution assumptions (see text for details) 
for the B5-IRS1 protostar and the final object from condensation B5-Cond2. 
First, we use the on-sky separation and the line-of-sight velocity difference as 
the total binary distance and velocity difference (line a, red). Next, we assume 
that the velocity difference along the line-of-sight is representative for the 
difference in the other directions, and therefore the total velocity difference is 
\3 times the velocity difference along the line-of-sight (line b, orange). Line c 
(light blue) assumes that the binary separation on the plane of the sky is a good 
estimate for the separation along the line-of-sight, and therefore the total 
separation is estimated as \2 times the separation on the plane of the sky. 
Finally, we compute the energies assuming both b and c together (line d, dark 
blue). Grey vertical lines denote some representative efficiencies, condensation: 
Black vertical line marks the upper mass limit for a brown dwarf (0.076Mgun). 
This figure shows that for all estimates the closest separation binary is bound. 


Additional observations of the distribution of dense gas in other regions 
using the VLA and/or Atacama Large Millimeter Array (ALMA) will 
determine the frequency of occurrence of filaments (or substructure) in 
dense cores and the distribution of separations of pre-stellar condensa- 
tions at birth. 


Online Content Methods, along with any additional Extended Data display items 
and Source Data, are available in the online version of the paper; references unique 
to these sections appear only in the online paper. 
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METHODS 


Observations and data reduction. Very Large Array. We conducted VLA observa- 
tions of the B5 region on 2011 October 16-17 in D-array configuration, and on 2012 
January 13-14 in CnD-array configuration (project 11B-101). We used the high- 
frequency K-band receiver and configured the WIDAR correlator to observe two 
basebands of 4 MHz bandwidth around the NH;(1,1) and (2,2) lines. Each base- 
band is split into 8 adjacent spectral windows of 500 kHz bandwidth, with a channel 
separation of 0.049 km s~ '. The quasar 3C 48 was used as flux calibrator, 3C 84 as 
the bandpass calibrator, and J0336+3218 as the phase calibrator. 

We reduced the data with the Common Astronomy Software Applications pack- 
age*'. The images were created using multi-scale clean (with scales of 0, 4 and 12 
arcsec and smallscalebias parameter of 0.2) with a robust parameter of 0.5 and a 6 
arcsec beam. We also included the NH; single dish data obtained with the Robert 
C. Byrd Green Bank Telescope’’ as a model image to recover the extended emission. 
The image has a noise level of 4 mJy per beam per channel. 

The integrated emission map is calculated using all hyperfine components in the 
following velocity range: (— 10.016, —8.534) kms” *, (1.845, 3.575) kms’, (9.012, 
11.483) kms ', (9.012, 11.483) kms 7, (16.919, 18.896) kms’, and (28.781, 30.758) 
kms 1. The final noise achieved is 2.8 mJy per beam km s 1. 

James Clerk Maxwell Telescope (JCMT). We observed B5 in the dust continuum 
emission at 450 [um and 850 jum using the Submillimetre Common-User Bolometer 
Array 2 (SCUBA-2) bolometer array” at the JCMT. The observations were carried 
out on 2013 August 16 and 23, and on 2013 August 3 under project M13BU14 during 
grade 1 weather. We use the iterative map-making technique, makemap”, with 0.5 
arcsec pixels to match the NH3(1,1) VLA map. The initial reductions of the scans 
are co-added to create a mosaic, which we use to create a mask of signal-to-noise 
ratio >5. Our final mosaic is created by co-adding a second reduction of the indi- 
vidual scans where the mask defines areas with emission. The angular resolution at 
450 pm and 850 uum is 9.8 arcsec and 14.6 arcsec (ref. 34), respectively. We only use 
the 450 tum map, since its resolution is similar to the VLA observations. 

We calibrate the flux scale of the observations using the Flux Correction Factor 

(FCE) of 4.71 + 0.5 Jy pW" arcsec’? for 450 jum and of 2.34 + 0.08 JypW ‘arcsec’ 
for 850 um (ref. 34). We measure a noise level of 0.026 mJy per pixel and 0.23 mJy 
per pixel in the emission-free regions of the 850 um and 450 tm maps. Extended 
Data Fig. 1 shows the final SCUBA-2 450 jum and 850 jim maps, in which both the 
condensation B5-Cond1 and bright emission around the protostar B5-IRS1 are 
clearly detected, while condensations B5-Cond2 and B5-Cond3, although detected, 
are overwhelmed by the emission from B5-IRS1. The morphology of the emission 
is similar to the NH(1,1) emission map, confirming the filamentary structure and 
the condensations. 
Condensation identification. We use the dendrogram algorithm”' to identify the 
structures in the NH;3(1,1) integrated intensity map. Each structure corresponds to 
a surface of constant intensity. An advantage of the algorithm is that structures can 
be nested inside one another, representing the hierarchy of the cloud, and then simply 
visualized using a tree diagram. The condensations are the ‘leaves’, that is, the highest 
level in the dendrogram decomposition. We produce the dendrogram with the fol- 
lowing parameters: min_value = 4¢,,,, (minimum intensity considered in the ana- 
lysis), min_delta = 2¢,m (minimum spacing between isocontours), and min_npix = 
250 (minimum number of pixels contained ina structure), where ¢,,,; = 5 mJy per 
beamkms_ ' is the rms noise. The condensation radius, R,q, is defined as the equiv- 
alent radius, Reg = area. 

The centroid velocity of each condensation is computed by averaging the centroid 
velocity within a 6 arcsec X 6 arcsec box centred at the peak emission in the NH;(1,1) 
map. The velocity dispersion of each condensation is determined by taking the av- 
erage of the velocity dispersions within the condensation. In both cases, the uncer- 
tainties are estimated from the standard deviation of the quantities measured. 

The total flux for each condensation is defined as the total flux minus the back- 
ground emission removed within the structure boundary determined by the den- 
drogram. Extended Data Table 1 displays the fluxes calculated from the NH;(1,1) 
emission map for all condensations. Of the three condensations, only B5-Cond1 
flux is not contaminated by the bright B5-IRS1 in the SCUBA-2 450 kum dust emission 
continuum, and we include its total flux in the table. 

Mass determination for condensations. We determine the total mass of B5-Cond1 
using the total flux of the SCUBA-2 450 jum dust emission map assuming optically 
thin emission, 


eq? 


Maust =d’F,/(«,By(T)) 


where d is the distance, F, is the total flux, «, is the dust opacity per unit mass at 
frequency v, and B,(T) is the Planck function at temperature T. We use a dust to gas 
ratio of 0.01, and a dust opacity per unit mass of K, = 0.1(v/1 THz)? (ref. 35), which 
at 450 pum gives K450um = 0.044 cm” gt. We assume a temperature of T= 10K, 


which is the typical temperature of starless cores in Perseus*®, and distance of 250 pc 
(ref. 37). In the case of B5-Cond1, this gives a mass estimate of 0.39 Msun- 

We use the mass-to-NH;3 flux ratio derived from B5-Cond1, whose isolation 
makes mass calculations most reliable, to estimate the masses of condensations 
B5-Cond2 and B5-Cond3. This gives masses of 0.33 Mgsun and 0.39 Msun for 
B5-Cond2 and B5-Cond3, respectively. 

Wealso plot the enclosed mass as a function of radius in Extended Data Fig. 2a. 

Here we also compare these mass-radius relations with those predicted for density 
profile of p x r~? and p x r 1°, and it is clear that the condensations are better 
described bya p x 1 |” density profile between 800 Au and 1,900 au. A density pro- 
file of p x r 7 is a good description for a core in (or close to) hydrostatic equilib- 
rium, while a profile of p «x 1r~'° is a good model for free falling envelopes**. This 
also supports the notion that the condensations in B5 are centrally concentrated 
and close to forming a central object. 
Virial analysis for condensations. We calculate the virial parameter (the ratio 
between kinetic and gravitational energy) for a spherical core assuming a density 
profile p x r~ 1S (ref. 39), which (see above) isa good approximation for the con- 
densation’s density profile from 800 Au to 1,900 Au (see Extended Data Fig. 2): 


a=4Ra,/G M=928(R/pe)(M/Msun) ! (o,/km ey 


where R is the radius, M is the mass, and ¢, is the total velocity dispersion of the gas 
(including thermal and non-thermal components, 6,7 = Oy” + Gun”). A condensa- 
tion is bound if « <2. 

The average level of turbulence, o4,:4, measured towards condensations B5-Cond1, 
B5- Cond2 and B5-Cond3 is 0.66 X oy, 0.58 X oj, and 0.54 X o,,; respectively, where 
On = 0.2kms_' is the sound speed of the gas with mean molecular weight per free 
particle for 4 = 2.33 and at a temperature of 10 K. 

We calculate the virial parameter as a function of radius for each condensation 
using its measured average velocity dispersion (Extended Data Fig. 2b); the con- 
densations are all bound beyond a radius of 1,200 au. The virial parameter initially 
decreases as a function of radius until it reaches a minimum at ~60% of the 
condensation radius, and then it increases until it reaches the edge of the condensa- 
tion as defined by the dendrogram, while staying bound (Extended Data Fig. 2b). 
Free-fall timescale. The free-fall timescale for a uniform density sphere is defined 


as 
30 TRS R\3?7 mM \- 
tip =4/ = =1.6x 10° yr 
32Gp 8GM 0.1pe 0.1Msun 


where M = 4nR® p/3. The free-fall timescale, tg, is 4.0 X 104 yr, 3.5 X 10* yr, and 
4.1 X 10*yr for B5-Cond1, B5-Cond2 and B5-Cond3, respectively. Thus, about 
40,000 years is the ‘typical’ free-fall timescale for this system. This definition 
excludes the influence of magnetic fields, which could delay collapse. 

Stability analysis of the multiple system. We calculate the potential and kinetic 
energy of each object and determine if all of the objects are bound”. The gravita- 
tional potential energy, Vj, is calculated as 


[oe pa 


im i 


where m; and m, are the masses of objects i and j, and r;; is the distance between 


them. The kinetic energy of each object, T;, is given by 
T; = 5 Mi\Vi — Veom 
5m (Vi-—Veom) 


where 1; is the (line-of-sight) velocity of object i, and V.om is the velocity of the 
centre of mass of the system. A star is bound to the system if T;/|V;| <1. 

Using the stellar mass estimates, two-dimensional positions, and one-dimen- 
sional line-of-sight velocities, we find that all four objects comprise a bound system. 
The calculated kinetic-to-gravitational energy ratio of the system is 0.11. In order to 
assess the robustness of this result, we remove each object in turn from the analysis 
and find that the system remains bound. If we were to increase the mass of any of 
the four objects, this would decrease the energy ratio further. 

We next determine whether any of the four objects will become a stable binary 
or triple system, with the caveat that we only know the velocity along the line-of- 
sight and the separation on the sky. We determine the binding energy, semimajor 
axis, and eccentricity of the central or ‘inner’ system (B5-IRS1 + B5-Cond2). We de- 
termine the eccentricity using the following formula’: 


P 14 2(my +2) Lot” Ebina 
: Gm m3 
where m, and mz, are the component masses, Lio, is the magnitude of the total an- 
gular momentum vector on the centre of mass frame, and Eping is the binding energy. 
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The angular momentum is calculated using only the line-of-sight velocity and the 
projected on-sky separation between the considered components, Lyo¢ = [1m2/ 
(m, + m)] [(x2 — x1, yo — 1, 0)X(0, 0, v2 — v1)], where x; and y; are the on-sky 
relative positions and v; are the line-of-sight velocities. The binding energy is nega- 
tive, and the semimajor axis is a;, = 1,666 AU with a high eccentricity of 0.99. We 
note that changing the component masses has a minimal effect on the semimajor 
axis and eccentricity values (if all the masses are 0.2 Mgun, then din = 1,663 AU, ein = 
0.99 and the system remains bound). However, the determination of a high eccen- 
tricity must be taken with caution, because its uncertainty is mostly due to the lack 
of knowledge about the full orbital elements. 

Next, we treat the central system as the inner orbit of a triple system and deter- 
mine whether either [(B5-IRS1 + B5-Cond2) + B5-Cond3] or [(B5-IRS1 + B5-Cond2) 
+ B5-Cond1] could be a stable triple system. The former system could be a stable 
triple because it has a negative binding energy; the outer semimajor axis of this 
system is Gout = 4,060 Au, the eccentricity is 0.50, and the inner and outer orbit 
periods are 2 X 10° yr and 5 X 10° yr, respectively. 

However, we note that as the on-sky separation of the inner system (B5-IRS1 + 
B5-Cond2) is less than 5-10 times the separation of the outer system [(B5-IRS1 
+ B5-Cond2) + B5-Cond3], which makes it highly likely to be unstable over long 
timescales’. 

Finally, we find that the [(B5-IRS1 + B5-Cond2) + B5-Cond1] system has a pos- 
itive binding energy, and is therefore not a stable bound system. 

In summary, the four objects together are bound, but they do not constitute a 
stable hierarchical quadruple system. The central (B5-IRS1 + B5-Cond2) system 
is a bound binary (albeit with high eccentricity) and may make up a triple system 
with [(B5-IRS1 + B5-Cond2) + B5-Cond3]. 
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The fate of this system will probably be determined by internal evolution, that is, 
whether or not it becomes unstable, rather than by external perturbations”, owing 
to it forming in a low stellar density environment. 

Code availability. The code used in this research is freely available at https:// 


github.com/jpinedaf/B5_wide_multiple, it makes use of Astropy”. 
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Extended Data Figure 1 | Dust continuum emission maps of Barnard 5. B5-Cond2 and B5-Cond3. B5-Cond1 is clearly detected in the dust continuum 
a, b, Dust continuum emission observed with SCUBA-2 at 450 [tm (a) and emission. Since the dust continuum emission does show the presence of the 


850 jum (b). Contour levels are drawn at 3, 6, 9, 12 and 15X rms, where rms is _ filaments in low level emission, we conclude they are real column density 
0.23 mJy per pixel and 0.026 mJy per pixel in the 450 um and 850 tm maps, features. Filled white circles at bottom left corners show the angular resolution 
respectively. Emission from dust associated with the protostar B5-IRS1 of the observations. Blue- and orange-filled circles show the condensation 
dominates the field, and it makes it difficult to extract the emission from centres, while the filled stars indicate the protostar (B5-IRS1) location. 


©2015 Macmillan Publishers Limited. All rights reserved 


1.5 


per 


— B5-Cond1 
— B5-Cond2 
— B5-Cond3 


Condensation Mass (Msun) 


0 1000 2000 
Condensation effective radius, R. (au) 


Extended Data Figure 2 | Mass and virial parameter as a function of radius 
for condensations. a, The enclosed condensation mass, derived from 
NH;(1,1), at different effective radii for each condensation; b, the 
corresponding virial parameter as a function of effective radius for each 
condensation. The condensation mass grows rapidly with radius, with a profile 
similar to one expected for a density distribution of p x r_'° (dotted line in 
a) until it is close to the condensation boundary. In comparison, the dashed line 
shows the expected result in hydrostatic equilibrium (p « r ”), which is 
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different to the observed distribution. The virial parameter decreases with 
radius until it reaches a minimum of ~1.5, and then it slowly increases until it 
reaches the condensation boundary. Notice that virial parameters below the 
horizontal line (« = 2) imply bound condensations. The grey shaded region 
marks the regime where the effective radius is smaller than the angular 
resolution of the observations. The circles show the values at the half-mass 
radius. 
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Extended Data Table 1 | Condensation and protostar parameters 


Source RA Dec Vist Fnus Faso 
(hh:mm:ss.sss)  (dd:mm:ss.ss) (kms) (Jy kms“) (Jy) 
B5-IRS1 03:47:41.548 32:51:43.57 10.21+0.04 a a | 
B5-Cond1 03:47:38.928 32:52:15.31 10.43+0.03 0.349+0.001 0.992+0.009 
B5-Cond2 03:47:41.627 32:51:56.81 10.2340.01 0.251+0.001 — 
B5-Cond3 03:47:42.778 32:51:30.31 10.30+0.01 0.285+0.001 a 


RA, right ascension; Dec., declination; V\.,, central velocity of the observed molecular line; Frys, flux measured in the NH3(1,1) integrated intensity map; Faso, flux measured in the SCUBA-2 450 1m map. 
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Comet 67P/Churyumov-Gerasimenko sheds dust 
coat accumulated over the past four years 


Rita Schulz!, Martin Hilchenbach?, Yves Langevin’, Jochen Kissel’, Johan Silen*, Christelle Briois°, Cecile Engrand®, 
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Comets are composed of dust and frozen gases. The ices are mixed 
with the refractory material either as an icy conglomerate’, or as an 
aggregate of pre-solar grains (grains that existed prior to the for- 
mation of the Solar System), mantled by an ice layer”*. The pres- 
ence of water-ice grains in periodic comets is now well established**. 
Modelling of infrared spectra obtained about ten kilometres from 
the nucleus of comet Hartley 2 suggests that larger dust particles are 
being physically decoupled from fine-grained water-ice particles that 
may be aggregates’, which supports the icy-conglomerate model. It 
is known that comets build up crusts of dust that are subsequently 
shed as they approach perihelion®* °. Micrometre-sized interplan- 
etary dust particles collected in the Earth’s stratosphere and certain 
micrometeorites are assumed to be of cometary origin'’’*. Here we 
report that grains collected from the Jupiter-family comet 67P/ 
Churyumoy-Gerasimenko come from a dusty crust that quenches 
the material outflow activity at the comet surface“. The larger grains 
(exceeding 50 micrometres across) are fluffy (with porosity over 
50 per cent), and many shattered when collected on the target plate, 
suggesting that they are agglomerates of entities in the size range of 
interplanetary dust particles. Their surfaces are generally rich in 
sodium, which explains the high sodium abundance in cometary 
meteoroids’*. The particles collected to date therefore probably re- 
present parent material of interplanetary dust particles. This argues 
against comet dust being composed of a silicate core mantled by or- 
ganic refractory material and then by a mixture of water-dominated 
ices”’, At its previous recurrence (orbital period 6.5 years), the comet’s 
dust production doubled when it was between 2.7 and 2.5 astronom- 
ical units from the Sun”, indicating that this was when the nucleus 
shed its mantle. Once the mantle is shed, unprocessed material starts 
to supply the developing coma, radically changing its dust compon- 
ent, which then also contains icy grains, as detected during encoun- 
ters with other comets closer to the Sun*”. 

Since August 2014, the ESA Comet Rendezvous Mission, Rosetta’™”, 
has been in orbit around the Jupiter-family comet 67P/Churyumov- 
Gerasimenko, monitoring the evolution of the comet’s nucleus, near- 
nucleus region, and inner coma as a function of increasing solar flux 
input, as the comet moves towards the Sun. As part of these studies, the 
COmetary Secondary Ion Mass Analyser (COSIMA)** onboard Rosetta 
is collecting comet grains from the near-nucleus region and the inner 
coma onto special target plates’’, which are subsequently imaged and 
compositionally investigated by time-of-flight secondary ion mass spec- 
trometry using an indium ion source. The grain collection commenced 


at a heliocentric distance of 3.57 astronomical units (where 1 Au is the 
average Sun-Earth distance), when the comet was still at low activity. 
The optical analysis of the grains captured on the target plates at dis- 
tances beyond 3 AU shows that most have fragmented upon capture and 
a large fraction of grains more than 501m across have shattered. 
Figure 1a shows a typical example of a dust particle that has crumbled 
into a rubble pile upon collection, while Fig. 1b shows an example of a 
dust particle that has shattered into a loosely connected cluster with a 
wide range of sub-component sizes. These two types of feature are re- 
presentative of most large particles collected at less than 30 km from the 
nucleus during the first three months of the orbital phase. Given that the 
dust particles hit the target with a relatively low velocity (1-10 ms_')"”, 
their tensile strength must be very low. From the inertial deceleration 
forces upon grain capture the strength of the material can be approxi- 
mated, and a first rough estimate relevant for the present fragmenta- 
tion process is on the order of 1,000 Pa. 

The disintegration of cometary grains in the coma is often described 
as resulting from an icy grain component that evaporates when exposed 
to solar radiation, producing a secondary source for comet gaseous 
material?°*!, A dusty secondary source can, however, also be attributed 
to certain organic grains that are not mantled by water ice”. The coma 
dust returned by Stardust” featured various types of grain, including 
specimens that had disintegrated along the deceleration tracks when 
entering the aerogel (the ultralight porous gel in which the grains were 
captured) at velocities of the order of 6 kms ', and hence were com- 
posed of very fine or thermally unstable components**”’. The morpho- 
logy of the grains collected by COSIMA supports the presence of solely 
refractory material. A grain composed of an ice—mineral mixture would 
not shatter at low-velocity collection; instead, the icy part of such a 
grain would evaporate very shortly after collection, leaving one or more 
voids in the particle that remains on the target plate. Grains composed 
of (nearly) pure water-ice would evaporate at or shortly after collection 
and create a dark signature on the target plate. At the scale of the 
COSIMA image resolution (pixel size is 14 1m), there is no hint of 
volatiles having left the grains after collection. In other words, there is 
no indication of an ice—mineral mixture, or of pure icy grains hitting 
the target. This is in contrast to cometary grains remotely observed, or 
collected before the Rosetta mission. 

The most important difference between the Stardust and COSIMA 
grains is the heliocentric distance at which they were captured. The Star- 
dust samples were collected during a comet fly-by at 1.85 au, whereas 
the grains collected by COSIMA were dragged off the nucleus of a 
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Figure 1 | Dust particles. a, An example of a dust particle that crumbled into a 
rubble pile when collected. The particle was collected at a nucleus distance 

of 10-20 km, between 25 and 31 October 2014, with corresponding heliocentric 
distance range 3.11-3.07 Au. The image was obtained with two different grazing 
illumination conditions (top image illuminated from the right, bottom 

image from the left). The brightness is presented in logarithmic scale to 
emphasize the shadows, which indicate that the altitude above the target 
reaches about 100 1m. As the particle lies 4.2 mm below the centre of the 
collecting target, the shadows are tilted with regard to the horizontal direction. 
b, An example of a dust particle that shattered when collected. The distance, 
time of collection, illumination conditions, and logarithmic scale are the 
same as for a. The shadows indicate that the altitude above the target 

reaches about 60 jm. The two grains visible on the right are not part of the 
shattered cluster. 


re-approaching comet at heliocentric distances greater than 3 AU (as 
67P/Churyumov-Gerasimenko returned from its aphelion passage at 
5.68 Au having spent about four years at a distance beyond 4 Au). These 
COSIMA grains therefore come from a dusty layer that has built up over 
those four years, when the comet was so far from the Sun that the solar 
radiation was no longer able to create a gas drag that could efficiently 
remove the dust. The dust therefore remained on the surface, building 
up an ice-free, fluffy layer, below which lies an ice—dust mixture. When 
the comet returned to regions of higher solar irradiation the evapora- 
tion rate of the volatile gases underneath the dust layer increased again, 
lifting the particles from the dry upper dust layer into the inner coma, 
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and leaving their original dusty cohabitants (dust frozen together with 
the gas) behind. This left-behind dust replenishes the existing dusty layer 
from below, thereby maintaining its thickness in a quasi-steady state 
until the solar radiation is high enough that the amount of dust re- 
moved from the upper layer is larger than the new volatile-free dust 
produced underneath. As a consequence, the dusty layer will disappear 
over time and fresh material will come to the surface. The transition may 
be gradual but could be violent if there is a hard zone under the dusty 
layer (as may be indicated by the re-bounce of the Philae lander) below 
which high gas pressures are building up. From the increase in dust 
production rate observed telescopically in 2008 (ref. 14) we infer that 
the dusty layer was lost at some stage between 2.7 AU and 2.5 AU. That 
orbital section will be reached again during the present recurrence of 
the comet between 24 December 2014 and 20 January 2015, so the loss 
of the dusty layer has probably already occurred. 

The mass spectra of the surface of the COSIMA grains collected 
beyond 3 Au showa high abundance of sodium. Preliminary values ob- 
tained after calibration”® are as high as 0.8, normalized to Mg = 1. For 
comparison, the Na abundances (Mg = 1) for comet 81P/Wild-2 are 0.13 
(collected in aerogel) and 0.2 (collected on aluminium foil)”, 0.1 + 0.06 
for comet 1P/Halley”, and 0.055 for CI chondrites”. The Na abun- 
dance observed in Perseid and Leonid meteoroids is a factor of 1.5 higher 
than the chondritic value’’, which fits very well with the value measured 
by COSIMA. Furthermore, the fluffiness of the COSIMA grains sug- 
gests that they would fragment with time after release into the coma. 
From remote observations, such fragmentation of coma grains has reg- 
ularly been proposed*’. Therefore we conclude that the high Na abund- 
ance measured by COSIMA, combined with the fluffiness of the grains, 
supports the hypothesis that these grains represent the parent popu- 
lation of interplanetary dust particles in meteor streams of cometary 
origin. 

Beyond 3 Au, COSIMA has not collected any of the dust that is mixed 
with sublimating ice, but rather the dust that is present in the upper ice- 
free dust layer. When the comet loses its fluffy mantle, it is expected that 
the properties of the grains collected will be very different from those 
of the grains currently under analysis, which show the properties of 
‘space-weathered’ comet refractory material. The fresh material is likely 
to be a mixture of ice and dust, and its analysis should provide the de- 
tailed structure of this mixture. However, when the comet returns to 
the outer Solar System, a new dusty mantle will form as the upper layer 
again becomes free of ice. The formation of such a mantle was con- 
sidered for re-occurring comets® and detailed models exist for short- 
period comet nuclei””®. The physical processes and timescales of these 
models are consistent with assumptions made about the nucleus size, 
orbit and so on for 67P/Churyumov-Gerasimenko. Therefore, the grains 
collected from this comet provide direct evidence for the existence of its 
dusty mantle and also an indication of the structure of dust mantles in 
short-period comets. 
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Boron isotope evidence for oceanic carbon dioxide 
leakage during the last deglaciation 


M. A. Martinez-Boti!*, G. Marino”**, G. L. Foster’, P. Ziveri?*°, M. J. Henehan!®, J. W. B. Rae”, P. G. Mortyn”? & D. Vance!” 


Atmospheric CO, fluctuations over glacial-interglacial cycles remain 
a major challenge to our understanding of the carbon cycle and the 
climate system. Leading hypotheses put forward to explain glacial- 
interglacial atmospheric CO, variations invoke changes in deep-ocean 
carbon storage’, probably modulated by processes in the Southern 
Ocean, where much of the deep ocean is ventilated’. A central aspect 
of such models is that, during deglaciations, an isolated glacial deep- 
ocean carbon reservoir is reconnected with the atmosphere, driving 
the atmospheric CO; rise observed in ice-core records*°. However, 
direct documentation of changes in surface ocean carbon content and 
the associated transfer of carbon to the atmosphere during degla- 
ciations has been hindered by the lack of proxy reconstructions that 
unambiguously reflect the oceanic carbonate system. Radiocarbon 
activity tracks changes in ocean ventilation’, but not in ocean carbon 
content, whereas proxies that record increased deglacial upwelling*” 
do not constrain the proportion of upwelled carbon that is degassed 
relative to that which is taken up by the biological pump. Here we 
apply the boron isotope pH proxy in planktic foraminifera to two 
sediment cores from the sub-Antarctic Atlantic and the eastern equa- 
torial Pacific as a more direct tracer of oceanic CO, outgassing. We 
show that surface waters at both locations, which partly derive from 
deep water upwelled in the Southern Ocean*”, became a significant 
source of carbon to the atmosphere during the last deglaciation, when 
the concentration of atmospheric CO) was increasing. This oceanic 
CO, outgassing supports the view that the ventilation of a deep-ocean 
carbon reservoir in the Southern Ocean had a key role in the degla- 
cial CO, rise, although our results allow for the possibility that pro- 
cesses operating in other regions may also have been important for 
the glacial-interglacial ocean-atmosphere exchange of carbon. 


O 
PS2498-1 


The modern Southern Ocean is a region of vigorous upwelling of 
carbon- and nutrient-rich waters’. Much of the upwelled CO, is out- 
gassed to the atmosphere, owing to incomplete nutrient utilization in the 
Southern Ocean surface*. These waters are then resubducted as inter- 
mediate waters and feed the thermocline of the low-latitude oceans, such 
as the eastern equatorial Pacific Ocean’® (EEP), which is at present one 
of the main oceanic sources of CO; to the atmosphere” (Fig. 1). Many 
of the mechanisms proposed to reduce atmospheric CO, during gla- 
cial periods focus on a reduction of Southern Ocean CQ, leakage, via 
increased ocean stratification’” or a more efficient biological pump (prob- 
ably boosted by iron fertilization)’, or both. During deglaciation, this 
situation is reversed, and previously isolated deep-ocean carbon is thought 
to be upwelled and re-exposed to the atmosphere’. 

Deglacial upwelling of aged, nutrient-rich waters in the Southern 
Ocean, and their subsequent advection to the EEP, has been suggested 
on the basis of several proxies*®”’, including biogenic opal fluxes, stable 
carbon isotope ratios (5'°C) and radiocarbon activities (A'*C). Although 
these reconstructions provide valuable insights into ocean nutrient 
dynamics, circulation, and ventilation history, none of them directly 
tracks ocean—atmosphere CO exchange. For instance, if biological pro- 
ductivity efficiently used upwelled nutrients and carbon, as possibly indi- 
cated by the deglacial increase in Southern Ocean and EEP opal fluxes*”*, 
then CO, leakage to the atmosphere may have been damped or even 
negated completely. Similarly, the appearance of low 5'°C signatures 
in the upper ocean at a wide number of locations globally during the 
deglaciation””*, which is usually taken as evidence for the re-ventilation 
of nutrient-enriched (low-5'3C) waters, could be modulated by changing 
air-sea CO, fractionation”’. Finally, perhaps the most compelling evi- 
dence so far for the importance of the recommunication ofa deep-ocean 
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Figure 1 | Location of cores PS2498-1 and ODP1238. Site locations are overlaid on a map of mean annual Apco, (ref. 11). 
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carbon reservoir with the atmosphere via the Southern Ocean comes 
from the collapse in A'*C gradients in deep waters around Antarctica 
due to the deglacial breakdown of stratification in that region®. How- 
ever, although this correlates well with the atmospheric AMC record 
and the first rise in atmospheric partial pressure of CO} (p29, ) during 
Heinrich stadial 1, there is little response during the second rise of p&G, 
centred on the Younger Dryas stadial. 

There is therefore an urgent requirement for direct evidence of degla- 
cial CO, variations in the surface waters ofkey areas of the global ocean, 
such as the Southern Ocean and the EEP, using which it will be possible 
to test the hypothesis that Oneamic CO, outgassing was central to the 
glacial-interglacial pa, rise*’. We addressed this issue by analysing the 
boron isotopic composition (5''B) of planktic foraminifera, a proxy'*” 
for oceanic pH that provides a direct link to seawater CO, content. 
Figure 2 compares records of pe@, (ref. 5) with new planktic forami- 
niferal 8''B and Mg/Ca-derived temperature for the sub-Antarctic 
Atlantic (SAA) (site PS2498-1; 44.15° S, 14.23° W, 3,783 m water depth) 
and the EEP (site ODP1238; 1.87° S, 82.78° W, 2,203 m water depth) 
(Fig. 1). Surface waters at PS2498-1 are at present influenced by water 
upwelled in the Antarctic zone and advected northwards via Ekman 
pumping*, whereas ODP 1238 is mostly influenced by the Pacific equa- 
torial undercurrent’’, a subsurface current originating in the western 
equatorial Pacific and fed by water masses of Southern Ocean (~70%) 
and North Pacific (~30%) origin’’. Previous literature has suggested 
an effective connection between the sub-Antarctic zone and the EEP 
via intermediate waters'®, through a process often referred to as ‘oceanic 
tunnelling” (Methods). 
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Figure 2 | 5''B, pH and APco, records from the SAA and the EEP during 
the last deglaciation. a-e, SAA; f-j, EEP. a, Globigerina bulloides 

3''B = (("'B/"B) ampte/("'B/!°B) nigrosi ~ 1) X 1,000%o (Methods) with 
analytical uncertainties (1c, dark grey envelope; 2c, light grey envelope). 

b, 5''B-based pH (blue) reconstruction for PS2498-1 and calculated seawater 
or in equilibrium with atmospheric’ pco, (pe, ) (green curve) at the same site. 
c, 8''B-derived Apco, (P&, —P2O, )- Filled black circle denotes present-day 
annual average Apco, near PS2498-1 with 20 uncertainties''. d, Globigerina 
bulloides Pe -based sea surface temperature (SST). e, j, peg, from Antarctic 
ice cores®. f, Globigerinoides sacculifer 8''B with analytical uncertainties (10, 
dark grey shading; 2o, light grey shading). g, 8''B-based pH (red) for ODP 1238 
and calculated seawater pH in equilibrium with Pes, (ref. 5) (green curve). 
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The 8''B record for the surface-dwelling foraminifer Globigerina 
bulloides in the sub-Antarctic core PS2498-1 displays a change of ~1.3%o 
between the early deglaciation and the Holocene epoch (~0.13 pH units; 
Fig. 2a, b). Unfortunately, the very low abundance of G. bulloides before 
~16 kyr ago precluded 5''B analyses for the Last Glacial Maximum, 
which limited our evaluation of the full glacial—interglacial §''B (and 
pH) shift at this site. The PS2498-1 5"'B record does not exhibit a grad- 
ual change through the deglaciation into the Holocene, but features two 
distinct decreases (lower pH). In the EEP (at ODP 1238), there is a brief 
negative excursion in the 5''B of the surface-dweller Globigerinoides 
sacculifer at 19 kyr ago, and the ensuing deglacial change is larger (~2%o; 
~0.2 pH units) and more gradual than that recorded by G. bulloides at 
PS2498-1 (Fig. 2f, g and Extended Data Fig. 1). 

To highlight the main patterns of variability in the 5''B-derived time 
series and to probabilistically account for all the uncertainties associ- 
ated with our reconstructions, we used a Monte Carlo approach that 
uses a non-parametric regression (LOESS function; Fig. 2b, c, g, h and 
Methods). A comparison of the PS2498-1 and ODP1238 pH records 
with the pH expected if surface waters remained in equilibrium with the 
atmosphere (Fig. 2b, g, green lines), reveals ‘excess’ surface ocean acid- 
ification in both these areas during the deglaciation and early Holocene. 
To gain further insight into deglacial ocean—atmosphere CO, exchange, 
we use the 8''B-derived pH data, along with estimates of tempera- 
ture, salinity and alkalinity, to calculate the partial pressure of CO, in 
seawater”? ( PCO, ; Methods). Note that the temperature, salinity and alka- 
linity have a minor effect on p&G (at most +13 patm, +15 patm and 
+12 atm, respectively), which is driven mainly by the 8''B-derived 
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h, 5''B-based Apco,. Filled black circle denotes present-day mean annual 
Apco, near ODP1238 with 2¢ uncertainties'’. i, Globigerinoides sacculifer 
Mg/Ca-based SST. The late-Holocene data in f, g and h have been averaged 
(individual measurements are shown as open red circles) (Methods). 
Envelopes in b, c, g and h are 68% and 95% uncertainty bounds (light blue or 
red shading and dotted lines, respectively) based on a LOESS regression of the 
5''B-derived records using a Monte Carlo approach; thick line denotes the 
maximum-probability fit to the data. YD, Younger Dryas; BA, Bolling—Allerod; 
ACR, Antarctic cold reversal; HS1, Heinrich stadial 1; LGM, Last Glacial 
Maximum. Filled triangles at the bottom indicate calibrated '*C ages 
(Methods). 
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pH, given the strong correlation of pH and [CO,] within the ocean 
carbonate system”. The p@, profiles are compared with p@G, from 
ice-core records to yield Apco, = peo, — P&G, and, notably, our late- 
Holocene Apco, estimates agree within uncertainties with modern water- 
column data from nearby locations"’, supporting the accuracy of our 
5''B-pH calibrations (Fig. 2c-h). 

Our EEP Apco, record reveals that this region became a significant 
source of CO, to the atmosphere during the deglaciation and the early 
Holocene, reaching, ~ 13 kyr ago, a peak of +90 + 16 patm (20), which 
exceeds by ~45 atm the average modern values at this location" 
(Fig. 2h). Comparison with earlier studies in the western and central 
Pacific’ ** that reported deglacial Apco, values of +100 to =+185 patm 
shows the widespread nature of the CO, outgassing in the equatorial 
Pacific (Extended Data Fig. 2). The Apco, data from the SAA also indi- 
cate that this region acted as a source of CO) to the atmosphere from 
~16 to ~7.5 kyr ago, but in the form of two prominent multimillennial 
events, with a maximum of +50 + 18 atm (that is, ~65 atm higher 
than present-day average values'') ~15 kyr ago (Fig. 2c). 

The occurrence of upper-ocean acidification in the SAA and the EEP 
during the deglaciation coincided, within uncertainties, with excursions 
to higher opal fluxes at nearby sites*"* (Fig. 3a, b, f, g). This observation 
provides compelling evidence for the proposed link between, on the one 
hand, the resumption of Antarctic upwelling* and attendant Ekman 
transport*”® of deep waters enriched in nutrients and [CO ], and, on the 
other hand, the increase in oceanic Apco, in the wider Southern Ocean. 
It further suggests that advection to the EEP, via oceanic tunnelling, of 
these CO,-rich waters may (at least partly) explain the deglacial Apco, 
maximum documented at ODP1238. This interpretation is corrobo- 
rated by the broad synchronicity of the acidification phases with excur- 
sions to depleted surface ocean 8'°C at our core locations (Fig. 3c, h), 
with the deglacial rise in the concentration of atmospheric CO) (Fig. 3e, 1), 
and with the negative shift® in the 5'°C of atmospheric CO, (Fig. 3e, k). 
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Figure 3 | Apco, and 35°C records from the SAA and the EEP during the 
last deglaciation compared to opal fluxes, pig, and atmospheric 33C. 
a-e, SAA; f-k, EEP. a, Surface ocean Apco, reconstruction for PS2498-1. 

b, Opal fluxes for TN057-13-4PC (Antarctic zone)*. c, PS2498-1 G. bulloides 
38°C = ((PC/C)sampte/(°C/?C) vp - 1) X 1,000%o (blue dots; VPDB, 
Vienna PeeDee Belemnite) with a LOESS non-parametric regression of the 
data (span = 0.4) (solid line; Methods). d, j, Atmospheric CO concentrations 
from Antarctic ice cores®. e, k, Atmospheric 81°C record from Antarctic ice 
cores®. f, Surface ocean Apco, reconstruction in ODP1238. g, Opal fluxes 
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However, the differences in the patterns of pH, Apco,, 5'*C and opal 
at the two locations (Figs 2 and 3 and Extended Data Fig. 1), suggest 
that the EEP record does not represent merely a downstream expres- 
sion of the deglacial CO, outgassing in the Southern Ocean, and that 
additional processes or CO sources, or both, need to be considered to 
fully explain its structure. Intermediate waters originating in the South- 
ern Ocean incorporate remineralized carbon during transit to the EEP, 
which may modify their geochemical characteristics**. Additionally, 
because source waters of the Pacific equatorial undercurrent also include 
waters originating in the North Pacific’ (~30%), these may provide a 
potential supplementary source” of CO. During the early Holocene, 
both our Apco, records also indicate continued oceanic CO, outgas- 
sing, yet p&G, shows little change. This suggests that ocean outgassing 
at this time was balanced by CO} uptake, possibly due to forest regrowth 
and peat build-up as the continental ice sheets retreated from the north- 
ern continents””%, in line with the near-contemporaneous rise’ in atmo- 
spheric 5'°C (Fig. 3e, k). 

Although millennial changes in the mean position of Southern Ocean 
frontal systems”’, and in the upwelling strength in the EEP due to shifts 
in the intertropical convergence zone”, have the potential to change 
Apco, at our sites, such changes seem unlikely to exert a dominant 
influence on our records. If the high Apco, spikes we document were 
driven by a northward shift of the regional oceanic fronts (PS2498-1) 
or by increased upwelling (ODP1238), they should be accompanied by 
surface cooling, whereas in fact both sites feature a warming trend during 
these intervals (Fig. 2 and Methods). It is also unlikely that the signals 
are driven by migration in foraminiferal depth habitat. In the SAA, pH 
vertical gradients are small, whereas in the EEP migration to deeper and 
more-acidic waters is inconsistent with the warming signal in Mg/Ca 
(Fig. 2) and with the structural similarity between G. sacculifer 5'°O 
and 8'°C records and those for the strict surface-dweller Globigerinoides 
ruber (Methods and Extended Data Fig. 3). 
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for V19-30 (EEP)"*. h, i, ODP1238 G. sacculifer (i) and Neogloboquadrina 
dutertrei (h) 5'°C with a LOESS non-parametric regression of the data (solid 
lines; span = 0.7 (h) and 0.33 (i)). Envelopes in a and f are 68% and 95% 
uncertainty bounds (light blue or red shading and dotted lines, respectively) 
based on a LOESS regression of the 5''B-derived records using a Monte Carlo 
approach; thick line denotes the maximum-probability fit to the data. 
Envelopes in e and k are 2¢ uncertainty bounds around the Monte Carlo 
average (thick line)°. 
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Local changes in the efficiency of the biological pump, stimulated by 
iron fertilization, also have the potential’ to influence surface water pco,. 
The evidence that the EEP was a weaker CO, source (or even a sink) 
during the Last Glacial Maximum may be ascribed to a dust-derived 
relaxation of iron limitation and attendant strengthening of the bio- 
logical pump”*”®. It is also possible that the gradual decrease in the iron 
supply to the EEP and sub- Antarctic regions during the deglaciation’*”’ 
could explain part of the high deglacial Apco, at these sites. However, 
the marked differences between the structure of the dust flux records 
and our Apco, reconstructions argue against this hypothesis (Extended 
Data Fig. 4). 

It therefore seems most plausible that the Apco, pulses that we report 
here using planktic foraminifera 5''B reflect oceanic CO, outgassing 
during the last deglaciation. A common origin for the Apco, anomalies 
presented here probably involved the renewed upwelling of aged® deep 
water enriched in nutrients* and carbon in the Southern Ocean, which 
subsequently ‘leaked’ into the atmosphere, contributing to the deglacial 
pa, rise and the atmospheric AC and 5'°C decreases**!. However, 
our analysis also indicates that other carbon sources, possibly located 
in the wider Pacific Ocean’*, may need to be invoked to fully explain 
glacial-interglacial ocean—-atmosphere CO, exchange. 


Online Content Methods, along with any additional Extended Data display items 
and Source Data, are available in the online version of the paper; references unique 
to these sections appear only in the online paper. 
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METHODS 

Material and methods. Core PS2498-1 was retrieved by the RV Polarstern from 
the eastern flank of the mid-Atlantic Ridge (44.15° S, 14.23° W, 3,783 m water depth), 
in the sub- Antarctic Atlantic Ocean**”? (Fig. 1). Between 400 and 650 individuals 
of G. bulloides were picked from the 300-355 um size fraction (10 j1g per shell with 
B/Ca ~ 35 mol mol ') from this site. Ocean Drilling Program (ODP) Site 1238 
(1.87° S, 82.78° W, 2,203 m water depth) was drilled during Leg 202 by the RV Joides 
Resolution from the southern flank of Carnegie Ridge, off the shore of Ecuador in 
the EEP** (Fig. 1). From ODP1238 we picked 15-90 individuals of G. sacculifer of 
mixed morphotypes (see below) from the 425-500 1m size fraction (40 1g per shell 
with B/Ca ~ 100 umol mol 1) for 8!'B and trace elements, and 10-20 individuals 
from the 355-425 jum size fraction for 8'°O and 8'°C. Twenty individuals of G. ruber 
sensu stricto from the 250-355 um size fraction were picked for 5'°O and 8'°C, 
and 30 individuals of N. dutertrei were picked from the 355-500 jm size fraction 
for trace elements and 5'8O and 5'°C. Foraminiferal samples were crushed between 
cleaned glass microscope slides before cleaning and analysis. 

Age models. For core PS2498-1, we converted existing” accelerator mass spectrom- 
etry (AMS) '*C data to calendar ages using the Calib 7.0 program**“* with the Marine13 
data set*’. Local reservoir corrections of AR = 300 yr (ref. 37) and AR = 900 yr were 
used for the past 16 kyr and between 16 and 26 kyr ago, respectively (Extended Data 
Fig. 5a), following recent studies®”* (compare with ref. 39). Different AR values may 
affect the chronology by up to 0.9 kyr for the LGM and early deglacial sections, 
although these potential changes in age model do not impact our conclusions 
unduly (Extended Data Fig. 5b). 

The age model for ODP 1238 is based on ten new AMS “C dates (Extended Data 

Fig. 6a). Monospecific samples of N. dutertrei (~300 individuals, ~12 mg) were 
analysed at the Lawrence Livermore National Laboratory Center for Accelerator 
Mass Spectrometry. Carbon- 14 ages were calibrated using the Calib 7.0 program**”° 
with the Marine13 data set*'. In line with previous studies***! in the EEP, an aver- 
age AR of 72 + 35 yr was applied to all '*C data (http://calib.qub.ac.uk/marine). 
To derive an age model for ODP1238, the relationship between sediment depth 
and calibrated ages was fitted with a third-order polynomial regression (Extended 
Data Fig. 6b). A potential complication in constructing '*C-based age models in 
the EEP is the possible presence of old waters during some intervals of the last 
deglaciation’. However, the gradual increase in age with depth at ODP1238, 
with no sign of age reversals (compare with refs 42, 43) suggests that our age model 
is not likely to be influenced by extremely old '“C, or that the influence, if present, 
was small. Moreover, there is good agreement between the benthic and planktic 
5'80 records in ODP1238 and those of other EEP cores”*°*!4 (Extended Data Fig. 7). 
Although this does not represent a definitive validation, given that all these sites 
could potentially be influenced by the same aged water mass, it documents the coher- 
ence of our age model with the published literature for this region. 
Foraminiferal signal carriers. Globigerina bulloides is ubiquitous in the South 
Atlantic’ and its abundance seems to be generally associated with the phytoplank- 
ton productivity maxima during austral summer”. In the proximity of PS2498-1, 
G. bulloides dwells in the upper ~60 m of the water column, with a distinct maxi- 
mum between 0 and 25 m (ref. 46). 

A number of proxy systems based on G. sacculifer in the EEP have been demon- 
strated to reflect mean annual environmental conditions”. In the Panama Basin 
the depth distribution of G. sacculifer is morphotype dependent”: G. sacculifer with- 
out a sac-like final chamber dwells predominantly in the surface mixed layer, whereas 
G. sacculifer with a sac-like final chamber dwells in the thermocline” (25-37 m). 
Another study”’ showed that close to the coastal upwelling region of the EEP (where 
ODP 1238 is located) the two morphotypes of G. sacculifer have similar depth ranges 
(0-30 m), which also overlap with that of the surface-dweller G. ruber (0-25 m). 
Globigerinoides sacculifer is also known to add gametogenic calcite at thermocline 
depths, which can account for up to 30% of the final test weight”’. Although these 
and other* studies may imply the influence of a thermocline signal on the G. sacculifer 
mixed-morphotype-based reconstructions, 8'8O and 5'°C data from ODP1238 
(Extended Data Fig. 3) indicate a similar depth habitat for G. sacculifer and the surface- 
dweller G. ruber”! (which does not add gametogenic calcite™*) throughout the past 
25 kyr. 
35'°C and 8'%0 analyses. Prior to stable isotope determination, foraminiferal sam- 
ples were rinsed with methanol, ultrasonicated and then oven-dried at 40 °C. Ana- 
lyses of G. bulloides (PS2498-1, complementing ref. 55), G. sacculifer (ODP1238) 
and N. dutertrei (ODP1238) were performed at the Universitat Autonoma de Bar- 
celona, using a Thermo Finnigan MAT253 mass spectrometer coupled to a Kiel IV 
device for CO, sample gas preparation. External reproducibility (1c) of carbonate 
standards was better than +0.05%o for 8'8O and +0.03%o for 5'°C. ODP1238 
G. ruber 8'°O and 8'°C data were measured at MARUM-Center for Marine Envi- 
ronmental Sciences, University of Bremen, using a Thermo Finnigan MAT252 mass 
spectrometer coupled to a Bremen-type automatic carbonate preparation device. 
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The precision (1a) was better than 0.07%o for 38180 and 0.05%o for 8'°C, on the 
basis of replicates of an internal limestone standard. 
Trace element and 5"B analyses. Foraminiferal samples were oxidatively cleaned>“*, 
dissolved in ~0.15 M Teflon-distilled HNO, and centrifuged. An aliquot (~20 jl; 
~7% of the total sample) was taken for trace element analyses, which were per- 
formed ona Thermo Scientific Element 2 single-collector ICPMS at the University 
of Southampton’’. Mg/Ca ratios were converted to calcification temperatures using 
the calibration of ref. 59 for G. sacculifer (ODP1238) and ref. 60 for G. bulloides 
(PS2498-1). The analytical reproducibility for Mg/Ca was +2.7% (20), and the efficiency 
of the foraminiferal cleaning'”** was verified using Al/Ca ratios (<100 mol mol! 
inall samples, and typically <60 pmol mol” '). The N. dutertrei (355-500 jm) Mg/Ca 
record was produced at the University of Bristol and at the University of South- 
ampton using the same standard set to ensure comparability. Samples were reduc- 
tively cleaned*””' and measured on a Thermo Scientific Element 2 single-collector 
ICPMS following ref. 17. Calcification temperatures were derived from ref. 59. 
Boron was separated from the remaining sample using Amberlite IRA-743 boron- 
specific anion exchange resin’”. 5''B was measured on a Thermo Scientific Neptune 
multicollector inductively coupled plasma mass spectrometer (MC-ICPMS) at the 
University of Southampton'”**, The external reproducibility of 5'’B analyses was 
calculated following the approach of ref. 58, and is described by the relationship 


20 = 1.87 exp") 4.0.22 exp 1" (1) 


where [*'B] is the intensity of the ''B signal in volts”. 

Owing to the low abundance of G. sacculifer during the Holocene in ODP1238, 

the associated 20 uncertainties (calculated using equation (1)) were relatively large 
(0.56-0.84%bo). The four most recent Holocene samples have been averaged, with a 
20 uncertainty calculated as the mean of the individual uncertainties (from equa- 
tion (1)) divided by the square root of n — 1 (n= 4). 
The boron isotope pH proxy. The boron isotope pH proxy has been extensively 
described in previous studies'®'”°*. Briefly, boron in seawater exists mainly as two 
different species, boric acid (B(OH)s) and borate ion (B(OH), _), and their relative 
abundance is pH dependent. There are two isotopes of boron, ''B (~80%) and '°B 
(~20%), with a ratio normally expressed as 


1B /19B, , 
5 B=(— oP’ 1}  1,000% 
(speaeg ee 7 


where !’B/)°Byisrosi is the isotopic ratio of the NIST SRM 951 boric acid stand- 
ard® (11B/"°B = 4.04367). 

There is pronounced isotopic fractionation between the two boron species, with 
boric acid being enriched in ''B by 27.2%o. Because the concentration of each species 
is pH dependent, their isotopic composition also has to change with pH to main- 
tain a constant seawater 5''B. Calibration studies**°> have shown that the borate 
species is predominantly incorporated into foraminiferal CaCO3, and ocean pH 
can therefore be calculated from the 5"'B of borate as 


* 35 Bay _ 3" Bhorate 
pH=pk, —log sil H-l0K, sl 11-10 (2) 
35" Bow — ( Kg8° Boorate) — 1,000( Kp—1) 


where pK"; is the dissociation constant for boric acid at in situ temperature, salinity 
and pressure”, 5!'B,,, is the isotopic composition of seawater® (39.61%), 8 Bioraté 
is the isotopic composition of borate ion, and '''°Kg is the isotopic fractionation 
between the two aqueous species of boron in seawater™ (1.0272 + 0.0006). 

Calibration of 5''B pH proxy in G. bulloides. We combined core-top and sedi- 
ment trap data to calibrate the 5'’B pH proxy in G. bulloides over a broad range 
(~2%o) of 3!’ Brorate (ref. 69). Core-top samples were taken from core archives at 
the University of Tiibingen (Germany) and NIWA (New Zealand) (Extended Data 
Fig. 8). Carbon-14 dating” (samples from NIWA) and the presence of Rose Bengal- 
stained living benthic foraminifera (samples from University of Tiibingen) confirmed 
recent ages for the samples used. pH was estimated for core-top sites using surface 
water oceanographic data''*””! (following ref. 62), and regional total alkalinity/ 
salinity/temperature relationships”. The pre-industrial partial pressure of CO. in 
seawater (p@%, ) at each core-top site was estimated by applying monthly ocean- 
atmosphere Apco, interpolated from surrounding sites'' (and corrected for the 
post-industrial changes in flux’”’) to a pre-industrial atmospheric pco, value from 
ice-core data. Monthly estimates of pH were then calculated with CO2sys-Matlab”, 
using the constants of refs 67, 75, 76, and temperature and salinity data''. Monthly- 
resolved in situ 8" Byorate Could then be calculated from pH, temperature and salinity. 
For each sample site, the average of twelve monthly estimates of 5" Brorate Was taken 
as mean annual in situ ''Byorate and two standard deviations of the monthly vari- 
ability were taken as 20 uncertainty (representing intraannual variation in 3 {Bess cais 
at each site). These core-top samples were complemented by sediment trap samples 
from the Cariaco Basin CAR22(Z), collected in January 2007. pH, temperature and 
salinity data for the sediment trap site are interpolated from data from December 
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2006 and February 2007 (http://www.imars.usf.edu/CAR). These new calibration 
data are presented and plotted in Extended Data Fig. 8. We can then calculate 
31 Byorate from G. bulloides 5''B (8"'Baicite)» With associated 2a uncertainties, as 
follows: 


3" Bhorate = (5"' Bealcite + 3.440 + 4.584)/1.074 + 0.252 (3) 


Size fraction was not found to produce any effect on 5"'B (ref. 69); in this study we 
used the 300-355 um size fraction. 

51'B pH proxy in G. sacculifer. Conversion of 51'B of G. sacculifer to pH follows 
the approach described in ref. 77. Despite analytical biases between MC-ICPMS 
and N-TIMS analysis*, it can be assumed that the pH sensitivity of G. sacculifer 
5''B described by culture calibration’® (N-TIMS) is applicable to MC-ICPMS data™*”®. 
The 5''B-pH calibration used here therefore incorporates both culture data’, 
which we recalculated following ref. 62, and existing core-top data’’ (Extended Data 
Fig. 8). To account for the analytical bias between MC-ICPMS and N-TIMS, we 
corrected the data set of ref. 78 by applying an offset of —3.32%o. This is derived 
from the comparison between core-top G. sacculifer measurements by MC-ICPMS”” 
and N-TIMS* from adjacent sites*®. The following equation (with 2 uncertainties) 
can be then used to calculate the appropriate 35) Bhorate from G. sacculifer 3B 
(3 *Beatcite): 


3" Bhorate = (5" Bealcite — 3-600 + 0.722)/0.834 + 0.036 (4) 


Carbonate system calculations: pH, p¢o, and Apo, . To calculate pH from 8! Byorate 
(equation (2)), temperature and salinity estimates are required. Temperatures are 
derived from foraminiferal Mg/Ca (see above) and salinity is calculated taking into 
account the freshwater loss from the ocean due to the glacial growth of continental 
ice sheets*’. A Monte Carlo approach was used to generate 10,000 realizations of 
pH by randomly sampling the relevant input parameters within their given uncer- 
tainty bounds (2c): 8'"B plus/minus analytical uncertainty (equation (1)) and plus/ 
minus calibration uncertainty (equations (3) and (4)); temperature +1 °C; and 
salinity +1 p.s.u. To fully propagate the calibration uncertainty, 10,000 realizations 
of each of the calibration data points were first made by randomly varying each 
point within its x and y uncertainties and fitting a separate regression line for each 
set of 5’'Byorate and 81"Beaicite Values. This approach therefore accounts for the 
covariation in slope and intercept uncertainty in the average 31'B porate Bealcite 
calibration line shown in Extended Data Fig. 8 (note that this covariation arises 
because a fit to the data with a high slope has a low intercept, and vice versa). For 
each of the 10,000 realizations of the down-core 5'!B records from the Monte Carlo 
approach, one of these calibrations was randomly chosen and applied to the whole 
time series (no calibration was chosen more than once). For each data point in the 
ensuing pH time series, the maximum probability of the distribution of the 10,000 
pH estimates was determined with uncertainties given by the 2.5th, 16th, 84th and 
97.5th percentiles. 

To calculate p@g, , another variable of the ocean carbonate system apart from pH 
is required”. Here we use total alkalinity. Both pH and p&,, are governed by the 
ratio of dissolved inorganic carbon to total alkalinity in seawater, and so pH changes 
are proportional to changes in pao, (refs 20, 82). Although our knowledge of total 
alkalinity is largely uncertain®’, modelling studies provide useful constraints on 
glacial—interglacial (GIG) total alkalinity change***’. The GIG total alkalinity change 
is +120 pmolkg™' in the ‘seven-box model’ of ref. 84, and is +140 pmol kg” in 
ref. 85. There is no available information, however, on the secular evolution of total 
alkalinity during the deglaciation. 

We calculated pa}, using the equations of ref. 20 and the ‘seacarb’ package*® in 
R. The relevant input parameters were taken from the 10,000 Monte Carlo pH 
simulations above with the addition of total alkalinity, which randomly varied for 
each data point in the simulation from ‘modern minus 25 pmol kg” to ‘modern 
plus 125 umolkg ” with a ‘flat’ probability (that is, an equal probability of total 
alkalinity being any value between these extremes at any point of the record). This 
approach avoids ascribing weight to any particular total alkalinity value and fully 
explores the likely range given the available, model-based, constraints. 

It is important to note that p2G, estimates are mostly determined by the recon- 
structed pH and that total alkalinity has little influence. For example, assuming 
that total alkalinity in both ODP1238 and PS2498-1 records is constant at either 
‘modern plus 125 pmol kg” or ‘modern minus 25 pmol kg” ” modifies reconstructed 
Peo, only by a maximum of ~22 j1atm (Extended Data Fig. 9). Similarly, temper- 
ature and salinity have little effect on our pH and p@,, calculations, which are over- 
whelmingly determined by 5B and its associated uncertainties. The uncertainties 
associated with these parameters for PS2498-1 and ODP1238 are respectively at 
most +19 and +9 patm for calibration uncertainty, +10 and +12 pratm for total 
alkalinity, +8 and +15 patm for salinity, and +9 and +13 atm for temperature 
(Extended Data Fig. 9). 


To calculate ‘equilibrium pH’, that is, surface seawater pH expected at each core 
location if waters had remained in equilibrium with the contemporaneous atmo- 
sphere during the past 25 kyr (Fig. 2b, g), we used interpolated ice-core pco, data**”** 
and the calculated temperature, salinity and total alkalinity for each of our samples 
(see above). Calculations were made using the ‘seacarb’ package** in R with the 
constants of refs 67, 75, 76. 

Here we define Apco, as the partial pressure of CO2 in seawater minus the par- 
tial pressure of CO, in the atmosphere: 


Apco: = Peo, — Peo, 

Our reconstructed Apco, for the most recent samples in PS2498-1 and ODP1238 
are —3 + 16and +27 + 17 pratm, respectively, which agree within uncertainties with 
modern mean annual Apco, from nearby locations (—15 + 8 and +45 + 8 patm, 
respectively; ref. 11). Because Apco, is a comparison between our pco, records 
and the Antarctic ice-core pco, record’, the resulting Apco, can be affected by the 
chronology used, especially in periods of rapid p%@, increase. To account for this, 
the uncertainty in Apco, due to our age models has been approximated by prop- 
agating a +0.5 kyr uncertainty in our records using a Monte Carlo approach. The 
resulting uncertainty in calculated Apco, is at most +12 patm for PS2498-1 and 
+7 patm for ODP1238 (Extended Data Fig. 9). 

To derive CO; flux between the ocean and the atmosphere, Apco, and the sea- 
air gas transfer rate (normally parameterized as a function of wind speed"’) need 
to be considered. However, a visual comparison between our Fig. 1 and fig. 13 in 
ref. 11 illustrates that areas of high Apco, tend to correspond to areas of sea-air 
CO, flux. Consequently, and in agreement with previous literature, in our inter- 
pretations we assume that a positive Apco, implies CO2 outgassing from the ocean 
to the atmosphere. 

Smoothing of the records and Monte Carlo simulations. The 5''B-derived and 
3'°C records presented in this study have been smoothed by fitting a non-parametric 
regression (LOESS function). Smoothing was performed in R with the degree of 
smoothing (the ‘span’ term) optimized using both ‘general cross-validation’ and 
‘leave-one-out cross-validation’ methods. These approaches identified nearly 
identical optimal degrees of smoothing for the 5''B-derived records (span = 0.25 
for PS2498-1 and 0.23 for ODP1238), and the most likely smoothed fit to the data 
was obtained using a probabilistic approach. To achieve this a LOESS function was 
fitted to each of the 10,000 realizations (for pH, pg, and Apco, ) generated using 
the Monte Carlo approach described above, and for each time step the distribution 
of smoothed lines was examined and the maximum probability and the 2.5th, 16th, 
84th and 97.5th percentiles were determined”. This approach fully accounts for 
the uncertainty in all of the input parameters, and provides an uncertainty in the 
most likely smoothed fit to the data. Uncertainty envelopes in the Apco, records 
also include the uncertainty associated with our age models. 

Hydrographic control on seawater pH in the SAA. Millennial-scale reorganiza- 
tions of the interhemispheric climate””* were accompanied by meridional shifts of 
the oceanic fronts of the Antarctic circumpolar current”’. PS2498-1 sits between 
the subtropical front and the sub-Antarctic front”, the migrations of which may 
have influenced surface ocean hydrography and carbonate system parameters at 
this location. The main negative shifts in the PS2498-1 G. bulloides 5'B record (and 
pH) generally occurred during warming episodes, as revealed by the co-registered 
G. bulloides Mg/Ca (Fig. 2d). This would agree with the contemporaneous south- 
ward shifts of the subtropical front at a nearby location” that are interpreted as the 
response of the South Atlantic to the main episodes of interhemispheric climate 
changes (for example during Heinrich Stadial 1 and the Younger Dryas). Today, 
surface ocean pH decreases southwards across the Antarctic circumpolar cur- 
rent”, implying that a southward shift of the subtropical front would have plaus- 
ibly caused a pH increase (as opposed to the observed decrease). Hence, the pH 
variability at PS2498-1 does not reflect reorganizations of the regional oceanic 
fronts, because these, at most, would have worked to counteract or damp, or both, 
the reconstructed acidification events. 

Intertropical convergence zone migrations in the EEP. The mean latitudinal 
position of the intertropical convergence zone (ITCZ) probably migrated south- 
wards during Northern Hemisphere cold phases*”*-”’. ITCZ southward shifts during 
boreal winter currently coincide with decreased upwelling in the EEP (and vice 
versa). The timing of the main shifts towards low 3''Band pH (high Apco,) in the 
ODP1238 matched Northern Hemisphere cooling events and attendant ITCZ 
southward displacements”. However, they cannot be explained by variations in 
EEP upwelling rates because reduced upwelling would cause lower Apco, , as opposed 
to the observed higher Apco, . In addition, the appearance of low-pH waters in the 
ODP 1238 deglacial record are associated with considerably increased tempera- 
tures, both in surface and, especially, in thermocline waters”® (Fig. 2iand Extended 
Data Fig. 10), which cannot be fully explained by deglacial warming. In fact, these 
positive temperature excursions at ODP1238 exceeded Holocene temperatures. 
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We therefore conclude that the ITCZ shifts cannot explain the observed carbonate 
chemistry and temperature changes documented in our EEP records. 

Oceanic tunnelling. Oceanic tunnelling is the transfer of intermediate waters (Ant- 
arctic intermediate waters or sub-Antarctic mode waters), and of the geochemical 
and thermal signals that they transport, from the Southern Ocean to the equatorial 
Pacific’. These intermediate waters entrain upwelled deep waters during their 
formation in the high-latitude Southern Ocean, and during the last deglaciation 
their chemical characteristics would therefore have been influenced by changes in 
upwelling around Antarctica. After formation, they spread northwards (being mod- 
ified along their path”’) and feed the Pacific equatorial undercurrent, an eastwards- 
flowing subsurface current that transports thermocline waters from the western 
Pacific along the equator””’”° and upwells in the EEP’*. Therefore, this intermedi- 
ate water route provides an efficient connection'®' between the Southern Ocean 
and the EEP*”*". 
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Extended Data Figure 2 | 5''B-derived APco, compilation for the 
equatorial Pacific during the last deglaciation and Holocene. Foraminifera- 
based record from the western equatorial Pacific”! (grey), Porites coral-based 
record from the central equatorial Pacific’*** (as published in ref. 22, green), 
and foraminifera-based record from the EEP (this study, red). The records of 
refs 21, 22 have been smoothed by fitting a LOESS function with degrees of 
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smoothing (span) of 0.2 and 0.4, respectively (Methods), to allow a better 
comparison with the ODP1238 record (see main text). ODP1238 is located in 
the EEP, and therefore represents a direct record of upwelling of CO,-rich 
waters, while the signal at central and western equatorial sites may have been 
modified during the westward transit of waters by, for example, equilibration 
with the atmosphere and/or nutrient utilization by the biological pump. 
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Extended Data Figure 3 | Planktic 5'*O and 5°°C records from ODP1238. 
a, Planktic 5!8O records from OPD1238. Red, G. ruber sensu stricto (ss) 
250-355 um; green, G. sacculifer (mixed morphotypes) 355-425 tm; black, 

N. dutertrei 355-500 uum. b, Planktic 5'°C records from OPD1238. To facilitate 
comparison between species, 8C data has been normalized®*. Red, G. ruber ss 
250-355 bum; green, G. sacculifer (mixed morphotypes) 355-425 jm; black, 
N. dutertrei 355-500 um. 
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d-f, EEP. a, 5''B-derived Apco, in core PS2498-1. b, Logarithm of the mass _—_ de-meaned and divided by their own standard deviation, and are displayed in 
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Extended Data Figure 5 | Age model for PS2498-1. a, Chronology of SAA Antarctic opal flux* (orange) records. Red, constant AR = 300 yr; green, 

core PS2498-1. Carbon-14 calendar age/depth relationships in core PS2498-1. | AR = 900 yr for intervals older than 16 kyr and AR = 300 yr for younger 
Grey shading indicates 95% confidence limits of calendar ages. b, PS2498-1 intervals®; magenta, variable AR correction” (ranging between 500 and 900 yr 
G. bulloides 5''B record plotted using the different chronologies described in between 13 and 16 kyr ago). 

Methods and compared with atmospheric CO, (green; refs 5, 87, 88) and with 
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Extended Data Figure 6 | Age model for ODP1238. a, Radiocarbon ages for 
ODP1238 determined from N. dutertrei tests at LLNL-CAMS. b, Chronology 


of EEP core ODP1238. Orange circles, calendar ages; black line, linear fit; 
red line, third-order polynomial fit (Methods). 
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ODP1238. a, Benthic 5'°O stratigraphy for ODP1238 compared with other —_ with other G. ruber stratigraphies from EEP cores. Black circles, ODP1238 
benthic 5'80 stratigraphies from EEP cores. Black circles, unpublished benthic | (Methods); green squares, site TR163-19’; red line, site TR163-22” blue line, 
5'8O data for ODP1238 generated by J. F. McManus (LDEO, Columbia site ODP 1240"). 

University); red line, site TR163-22”; blue line, sites RC13-140, RC23-22 and 


©2015 Macmillan Publishers Limited. All rights reserved 


LETTER 


a 
Size F ‘ 11 11 
- ; Latitude Longitude is re - S'Beaicite £26 8° Byorate 
Site Sample type anu (N) (-E) T(°C) Salinity pH +20 pK, (oe) (%e) Se) +20 (%o) 
Fit Core-top 355-400 -48.95 174.98 9.08 34.33 8.191 0.003 8.794 15.86 0.21 17.49 0.41 
MC577-17B Core-top 355-400 45.57 -17.40 15.32 35.70 8.188 0.002 8.709 1665 0.58 18.34 0.62 
Fit Core-top 250-300 -48.95 174.98 9.08 34.33 8.191 0.003 8.794 1490 0.33 17.49 0.41 
CAR22(Z)6 Sediment Trap 250-300 10.50 -64.66 24.17 36.70 8.066 0.018 8.597 16.23 0.35 18.21 0.11 
MC577-17B Core-top 250-300 45.57 -17.40 15.32 35.70 8.188 0.002 8.709 1643 0.58 18.34 0.62 
MC436 Core-top 300-355 39.80 -21.06 18.40 36.06 8.201 0.017 8.670 1637 040 18.89 0.51 
MC655 Core-top 300-355 38.42 5.40 23.23 37.38 8.161 0.010 8605 17.36 0.31 19.22 0.25 
Fit Core-top 300-355 -48.95 174.98 9.08 34.33 8.191 0.003 8.794 15.04 0.21 17.49 0.41 
Core-top 
TAN1106/38 (‘flattened' 300-355 -49.69 165.07 9.78 3449 8.186 0.003 8.783 15.16 0.26 17.51 0.28 
morphotype) 
Core-top 
TAN1106/38 (‘kummerform' 300-355 -49.69 165.07 9.78 34.49 8.186 0.003 8.783 15.19 0.27 17.51 0.28 
morphotype) 
CAR22(Z)6 Sediment Trap 300-355 10.50 -64.66 24.17 36.70 8.066 0.018 8.597 16.84 0.50 18.21 0.11 
ODP1172C Core-top 300-355 = -43.96 149.93 13.72 35.02 8.196 0.004 8.732 1649 0.23 18.14 0.47 
MC577-17B Core-top 300-355 45.57 -17.40 15.32 35.70 8.188 0.002 8.709 15.48 0.24 18.34 0.62 
IODP1313 Core-top 300-355 41.00 -32.96 18.50 36.03 8.182 0.008 8.668 1696 0.34 18.73 0.67 
IODP1308 Core-top 300-355 49.88 -24.24 13.22 35.41 8.183 0.001 8.736 15.52 0.50 17.98 0.50 
ODP 980 Core-top 300-355 55.49 -14.70 11.71 35.36 8.206 0.016 8.755 16.32 0.72 18.02 0.59 
ODP 980 Core-top 250-300 55.49 -14.70 11.71 35.36 8.206 0.016 8.755 16.24 062 18.02 0.59 
IODP 1313 Core-top 250-300 41.00 -32.96 18.50 36.03 8.182 0.008 8.668 17.05 0.35 18.73 0.67 
IODP 1308 Core-top 250-300 49.88 -24.24 13.22 35.41 8.183 0.001 8.736 15.38 0.45 17.98 0.50 
IODP 1308 Core-top 355-400 49.88 -24.24 13.22 35.41 8.183 0.001 8.736 15.04 0.44 17.98 0.50 
b 
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Extended Data Figure 8 | 5''B-pH calibrations for G. bulloides and 

G. sacculifer. a, Data tabulated; b, Data plotted. The green symbols and text 
show a new calibration for G. bulloides (with associated 26 uncertainties). 
Horizontal error bars for core-top samples are 2o of intra-annual variability in 
calculated monthly 3) Brorate and for sediment trap samples reflect the range of 
3"'Brorate between December 2006 and February 2007. Vertical error bars 
represent the analytical reproducibility (2c) as calculated using equation (1). 
The most recent PS2498-1 sample (2.2 kyr old) (black-filled circle) was not used 
in the calibration process, and is included to show its agreement with the 
calibration line. The red symbols and text show a calibration for G. sacculifer 
(with associated 20 uncertainties). The calibration line incorporates both 
culture”* (empty symbols) and core-top (red-filled symbols) data’’. Culture 
data analysed by N-TIMS (grey symbols and text)’”* has been corrected by 
applying a laboratory offset of —3.32%o (Methods) (the vertical grey arrow 


i Coretop, Site ODP 1238 (this study) 


19 21 23 25 


borate (%o) 

indicates an original N-TIMS calibration data point that falls outside the plot 
area). The ODP1238 late-Holocene average (black-filled square) was not 
used to produce the calibration equation, and is included to show its agreement 
with the calibration line. Horizontal error bars for core-top samples are 20 of 
intra-annual variability in calculated monthly 8! 'Bporate and for culture 
samples represent quoted uncertainties” in pH. Vertical error bars represent 
quoted uncertainties in 5''B measurements!” (2c). To calculate monthly pH 
variations at ODP1238, the method described in the G. bulloides calibration 
section has been used® (with total alkalinity derived using the total 
alkalinity/salinity/temperature relationship for the ‘Equatorial upwelling 
Pacific Zone’ in ref. 72). The black line denotes a 1:1 relationship, that is, a pH 
sensitivity equal to that of borate ion. Heavily and lightly shaded regions around 
calibration lines represent 1o and 2¢ uncertainties, respectively. 
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Extended Data Figure 9 | Effect of 5''B-pH calibration, total alkalinity and 
chronological uncertainties in ppg, and Apco, records. a, PS2498-1 5''B- 
based pco, record calculated with the G. bulloides calibration equation 

(thick blue line), and its associated 20 uncertainty (blue shaded envelope). 

b, PS2498-1 8''B-based pco, record assuming a constant total alkalinity of 
(i) modern values at PS2498-1 (blue), (ii) modern values minus 25 pmol kg! 
(green) and (iii) modern values plus 125 jmol kg! (red). c, PS2498-1 Apco, 
record calculated using (i) age derived from our age model (blue), (ii) age plus 
0.5 kyr (green) and (iii) age minus 0.5 kyr (red). d, ODP1238 5*'B-based 


Pco, record calculated with the G. sacculifer calibration equation (thick red 
line), and its associated 20 uncertainty (shaded red envelope). e, ODP1238 
5"'B-based peo, record assuming a constant total alkalinity of (i) modern 
values at ODP1238 (blue), (ii) modern values minus 25 mol kg! (green) 
and (iii) modern values plus 125 mol kg! (red). f, ODP1238 Apco, record 
calculated using (i) age derived from our age model (blue), (ii) age plus 0.5 kyr 


(green) and (iii) age minus 0.5 kyr (red). Note the different horizontal and 
vertical axes in each panel. 
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Extended Data Figure 10 | Globigerinoides sacculifer Mg/Ca-based sea surface SST (red) and N. dutertrei Mg/Ca-based thermocline temperature (TT; 
green) at ODP1238. 
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Recharge of a subglacial lake by surface meltwater in 


northeast Greenland 


Michael J. Willis’, Bradley G. Herried*, Michael G. Bevis* & Robin E. Bell? 


In a warming climate, surface meltwater production on large ice 
sheets is expected to increase. If this water is delivered to the ice sheet 
base it may have important consequences for ice dynamics. For ex- 
ample, basal water distributed in a diffuse network can decrease basal 
friction’” and accelerate ice flow**, whereas channelized basal water 
can move quickly to the ice margin, where it can alter fjord circula- 
tion and submarine melt rates”’®. Less certain is whether surface 
meltwater can be trapped and stored in subglacial lakes beneath large 
ice sheets. Here we show that a subglacial lake in Greenland drained 
quickly, as seen in the collapse of the ice surface, and then refilled 
from surface meltwater input. We use digital elevation models from 
stereo satellite imagery and airborne measurements to resolve eleva- 
tion changes during the evolution of the surface and basal hydrologic 
systems at the Flade Isblink ice cap in northeast Greenland. During 
the autumn of 2011, a collapse basin about 70 metres deep and about 
0.4 cubic kilometres in volume formed near the southern summit of 
the ice cap as a subglacial lake drained into a nearby fjord. Over the 
next two years, rapid uplift of the floor of the basin (which is ap- 
proximately 8.4 square kilometres in area) occurred as surface melt- 
water flowed into crevasses around the basin margin and refilled the 
subglacial lake. Our observations show that surface meltwater can be 
trapped and stored at the bed of an ice sheet. Sensible and latent heat 
released by this trapped meltwater could soften nearby colder basal 
ice" and alter downstream ice dynamics'*’’. Heat transport assoc- 
iated with meltwater trapped in subglacial lakes should be consid- 
ered when predicting how ice sheet behaviour will change in a 
warming climate. 

Studies of cryo-hydrologic systems in west Greenland suggest that 
supraglacial water that makes its way to the base of the ice sheet is 
flushed to the ocean through rapidly evolving subglacial drainage net- 
works**. The possibility of this water being captured“ and stored at the 
base has largely been neglected. Until recently, subglacial lakes remained 
undetected beneath Greenland. Two such lakes have now been iden- 
tified beneath the northwesternmost part of the ice sheet’*. These lakes 
are surprising because subglacial lake water is usually produced at the 
base of an ice sheet by geothermal and frictional melting’®’” but the ice 
above the lakes is too slow, too thin and too cold to support basal melt- 
ing’. The water in these subglacial lakes is hypothesized to originate 
from nearby supraglacial lakes that periodically drain to the bed’. If 
Greenlandic subglacial lakes are replenished from the surface, the ther- 
mal environment around such lakes could be warmer than expected. 
Meltwater arriving at or near the base of the ice sheet will refreeze and 
release its latent heat'’. This process will warm the basal ice, which could 
change its rheology and thus affect downstream ice flow rates. 

The ~8,500 km? Flade Isblink ice cap is situated close to Station Nord 
on the Princess Dagmar peninsula (81.3° N, 15.0° W), northeast Green- 
land (Fig. 1). Numerical weather model output'® indicates the average 
annual surface temperature at the southern ice divide (the line at which 
ice flows either one way or another; see Extended Data Fig. 3) is about 


—22°C and the long-term accumulation rate is ~0.25m of water 
equivalent per year. Using an estimated geothermal heat flux’? of 
~60mW m ~, we calculate the temperature at the bed—about 540 m 
(Extended Data Fig. 1) beneath the ice divide—to be approximately 
—9°C (ref. 20), far below the pressure melting temperature (about 
—0.5 °C). The base of the Flade Isblink ice cap is too cold to allow local 
production of basal meltwater. Surface meltwater is plentiful during the 
summers on the south-facing slopes of the southern part of the ice cap 
(see Fig. 1b). In 2006 supraglacial streams formed at high elevations just 
north of the ice divide. In the summers of 2006 to 2011 these northerly 
flowing meltwater streams always disappeared into a recurring moulin 
(81° 09’ 23 N, 16° 36’ 04’' W). 

In the late summer of 2011, a deep basin surrounded by crevasses 
formed on the surface of the Flade Isblink ice cap (Fig. 1, Extended Data 
Figs 2 and 3) just to the north of the ice divide. The basin formed over a 
21-day interval during which clouds obscured 250-m resolution MODIS 
optical imagery. On 16 August 2011 the Flade Isblink ice cap’s ice divide 
area had a smooth ice surface. On the next clear image on 6 September 
2011, a ‘mitten’-shaped collapse basin had formed. The ‘mitten’ is com- 
posed of two sub-basins separated by a shallow saddle. The larger, 
~1.6-km-wide eastern main basin is centred on the location of the re- 
curring moulin. A 2-m-high ice ridge aligned along the basin axis pro- 
bably formed by compression during the initial slumping of the basin. 
We interpret the formation of the collapse basin to be the result of the 
sudden outburst of a subglacial lake that was overfilled by meltwater 
draining from the surface. 

Along-track stereo satellite imagery collected 8 months later on 3 
May 2012 provides the first topographic measurements of the collapse 
basin. At that time the basin had a maximum relief of more than 75 m 
(Figs 1 and 2) and an area of about 8.4 km”. We set the pre-collapse sur- 
face elevations from a modified ICESat (Ice, Cloud and land Elevation 
Satellite)-controlled InSAR (INterferometric Synthetic Aperture Radar) 
digital elevation model (DEM) and calculate the basin volume to be 
~0.4km’. This basin volume is similar to many of the active lakes de- 
tected with ICESat altimetry in Antarctica’. There are no measurements 
of ice inflow for the interval between the formation of the basin and our 
first DEM. We therefore assume that the volume of the surface basin 
observed in May 2012 is equal to the volume of water lost during the 
2011 subglacial lake outburst event. It is likely that the original volume 
of the basin was even larger. We calculate the minimum discharge rate 
by dividing the volume of the basin by the maximum time interval that 
elapses (21 days). We find that the subglacial lake emptied at a rate of 
about 215 m*s_‘, which is more than three times the average flow rate 
of the River Thames in London and is also larger than drainage rates 
observed from Antarctic subglacial lakes. The subglacial drainage path 
for this flood can be estimated from hydrologic gradients”, although 
confidence in the result is limited by the large uncertainties in the bed 
topography and the potential of an open conduit system. Hydraulic 
potential based on bed topography” and the InSAR DEM”' indicates 
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Figure 1 | Surface basin, Flade Isblink ice cap, northeast Greenland. 

a, Composite satellite image of 70-m-deep basin. Supraglacial meltwater 
flowing northwards from the ice divide is being intercepted by marginal 
crevasses and routed to a subglacial lake. Scarring within the main basin is from 
an earlier supraglacial meltwater incursion. A NASA IceBridge airborne laser 
altimeter (lidar) profile*® acquired on 26 April 2013 is colour-coded to 

show differences in elevation (AH) between the lidar and a corrected 
WorldView-1 satellite DEM collected 9 days later, on 5 May 2013 (see scale on 
right). 10-m contours of ice surface show the ellipsoidal elevation from 
WorldView-1 DEM acquired on 17 May 2012. Background image is from 
WorldView-2 on 14 August 2012. b, Regional setting. Purple lines show the 
ice divides derived from ref. 21. The dotted line shows the surface melt limit 
on background MODIS imagery from 11 July 2012. The inset shows location 
within Greenland and the box shows location of panel a. WorldView-2 
imagery, copyright 2012, DigitalGlobe, Inc.; MODIS imagery courtesy 

of NASA. 


that the floodwaters from the subglacial lake exited the ice cap to the 
north, beneath the Marsk Stig Bree marine-terminating grounded out- 
let glacier*’, west of Station Nord (Fig. 1). 

We tracked the evolution of both the collapse basin and the assoc- 
iated surface drainage networks through the following two melt seasons 
using stereo pairs of half-metre-resolution WorldView satellite imagery 
over a 22-month period between May 2012 and March 2014 (for im- 
agery and DEM details and uncertainties see Extended Data Table 1). 
Spatially varying elevation gains of between 2 m and 5 m occur in the 
main basin (Fig. 2) early in the first melt season following the collapse. 
Weattribute these early season elevation gains (before 17 July 2012) to 
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drifting snow collecting in the basin'*”° and to ice inflow in response to 


the formation of the basin”. On 13 July 2012, the north-flowing supra- 
glacial meltwater network again formed close to the ice divide, ~5 km 
to the southwest of the ‘mitten’. The supraglacial streams from this 
network disappear down crevasses on the southern margin of the main 
basin by 19 July 2012 (see Fig. 1 and Extended Data Fig. 3). Scars on the 
basin floor indicate that the supraglacial streams occasionally bypassed 
the crevasses and flowed into the body of the main basin (Fig. 1 and Ex- 
tended Data Fig. 3). The north-flowing supraglacial streams continued 
to feed into the marginal crevasses through to 17 August 2012, the final 
high-resolution non-stereo image available for 2012. Meltwater cannot 
be seen in MODIS imagery on 30 August 2012. 

During the two-week period of 31 July to 14 August 2012, meltwater 
drained into marginal crevasses and the floor of the main basin rose 
about 6 m (Fig. 3). This elevation gain is far faster than the 8myr * 
(0.02 m per day) observed over subglacial Lake Whillans in West Ant- 
arctica”®. The rapid uplift at the Flade Isblink ice cap (>0.40 m per day) 
is the result of vertical motion localized in the basin floor (Fig. 2). This 
rapid uplift rate occurs only when surface meltwater enters the crevasses 
(Fig. 3). The uplift extends across the entire basin and is faster than 
other uplift documented during the evolution of the Flade Isblink ice 
cap basin. From observations of supraglacial meltwater entering cre- 
vasses and late-melt season uplift of the basin floor, we infer that supra- 
glacial meltwater recharges the subglacial lake. From 17 May 2012 to 14 
August 2012 the floor of the main basin rose a total of ~15 m. 

The ‘thumb’ basin evolved differently from the main basin during 
the 2012 melt season. In early July 2012 a melt lake formed within the 
smaller basin. The surface of this lake rose ~5 m by 27 July 2012. While 
rapid uplift began in the main basin over the next four days, the 
~140,000-m? lake in the ‘thumb’ sub-basin catastrophically drained, 
leaving behind a 3-4-m-high pile of ice rubble (Fig. 2). This type of 
block uplift is seen at other rapidly draining supraglacial lakes around 
Greenland’. 

Over the first winter (between mid-August 2012 and mid-April 2013), 
the main basin developed a pronounced topographic bulge. The floor 
of the basin rose an additional 15 m for a total of >30 m basin floor 
uplift relative to the basin elevation in May 2012 (Fig. 2). Between 17 
May 2012 and 5 May 2013 the volume of the surface basin decreased 
by 46.5(0.6) < 10° m®. (The parentheses indicate 2¢ uncertainties, the 
95% confidence interval.) Ice inflow accounted for about a third of this 
volume change, 15.2(1.2) x 10° m*, while a numerical weather model'® 
provides snow accumulation of 1.7(0.3) x 10° m? over this period. The 
majority of the volume loss, 29.6(1.1) x 10° m? [total basin volume re- 
duction — (ice inflow + snowfall volume)], is due to the uplift of the 
basin floor, which in turn is caused by the influx of supraglacial water 
into the subglacial lake. The numerical weather model’* predicts a re- 
markably similar amount of meltwater production, 31.8(6.4) 10° m? 
within the 57.2-km? catchment area that feeds into the crevasses around 
the collapse basin. The model predicts the end of surface meltwater 
production on 14 August 2012, while our observations show meltwater 
flowing for at least three more days. 

The northerly flowing supraglacial drainage network was absent dur- 
ing the cooler summer of 2013. Localized streams and pools occasion- 
ally formed on the floor of the main basin. Over the 2013 melt season, 
the floor of both sub-basins rose ~4m. Over the 2013-2014 winter 
(August 2013-March 2014) the basin floor rose about 4m more and 
the basin lost a further 16.8(0.4) X 10° m? of volume. The surface low- 
ering around the periphery of the ‘mitten’ (Fig. 2d and Extended Data 
Fig. 4), which is probably partly due to ice flow into the basin, provides 
about half of this volume change, 7.2(1.3) X 10° m®. Snow accumu- 
lation estimates are not available for this time period. Other causes of 
the surface lowering adjacent to the basin may be the elastic response 
to the ice rising within the basin and to melting occurring around the 
edges of the subglacial lake during the injection of surface meltwater. 
Snow accumulation and ice flowing into the basin provide spatially vary- 
ing, relatively low rates of surface uplift that are overprinted by rapid 
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Figure 2 | Elevation profiles through time. The dotted black lines in a to 

c show the pre-collapse ice surface elevation modified from ref. 21. a, Repeat 
elevation profile A to A’ (shown in d). Profile colour-indexed by time of satellite 
DEM acquisition (scale on right for a to c). Maximum uplift of around 38 m is 
seen at about 1,250 m from the start of the profile. The basin floor moves 
upwards through time, in part owing to the recharge of the subglacial lake. 
Uncertainties in elevation are typically ~0.5 m. b, Repeat elevation profile B to 
B’ (shown in d). Maximum uplift of about 33 m occurs at about 1,000 m from 
the start of the profile. c, Repeat elevation profile C to C’ (shown in d). Blue- 
grey background shading indicates the elevation gain due to formation of 
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supraglacial lake and hatched shading indicates single block uplift episode in 
response to drainage of a supraglacial lake. d, Examples of 3 m posting 
along-track stereo satellite-derived DEMs showing elevation gains over time in 
the basin. Star indicates location of elevation changes shown in Fig. 3. e, Change 
in surface elevation in part caused by recharge of subglacial lake between 

17 May 2012 and 24 March 2014. Maximum displacement is ~38 m in the 
northern part of the main basin. Small amounts of subsidence occur around 
the periphery of the basin owing to ice flow into the basin, melting at the 
boundary of the subglacial lake and the elastic response of the surrounding ice 
to the basin floor rising. 
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Flade Isblink ice cap, Greenland 


In late 2011, the evacuation of a subglacial lake caused a surface basin 

rs to form near the southern summit of the Flade Isblink ice cap in northeast 
Greenland. In the 2012 melt season, supraglacial streams entered nearby 
surface crevasses and recharged the subglacial lake, causing the floor of 


Surface basin formed between 

16 August and 6 September 2011. In May 
2012, the basin had an area of 8.4 km? 
and maximum relief of ~70 m 


the surface basin to rise. 


* Crevasses —~ 


——— 


Meltwater flows 
—— into surface crevasses 


As the subglacial and makes its way to 


lake refills, the floor of 
the surface basin rises a 
maximum of ~38 m 


Basal ice properties 
altered owing to 
thermodynamic 
equilibrium and latent heat 
transfer during freezing of 


Surface meltwater 


Figure 4 | Conceptual model of subglacial lake 
recharge from supraglacial streams at the 
Flade Isblink ice cap. Graphic is not to scale. 
WorldView-2 surface imagery, copyright 2012, 
DigitalGlobe, Inc. Inset shows location 

on Greenland. 


Ice cap thickness 
540 m 


Subglacial lake size is unknown, 
but accumulated ~30 million m’ of 
meltwater between 2012 and 2014 


subglacial lake water 


elevation gains that occur when meltwater flows into nearby crevasses 
(Fig. 3). 

Over about two years, the ice surface in the basin rose by up to 38 m 
and the basin lost a total volume of 67.3(0.4) X 10° m*. About half of 
this volume, 32.9(1.4) X 10° m? is due to ice flow into the basin, while 
the remaining half, 34.4(1.5) x 10° m®, is composed of volume gains at 
the subglacial lake and the accumulation of snow. Our observations 
provide the first evidence of the direct recharge of a subglacial lake from 
surface meltwater (Figs 3 and 4). These results demonstrate that melt- 
water can be trapped and stored at the base of an ice cap. The subglacial 
environment in western Greenland is thought to react quickly to the 
transfer of surface meltwater to the bed. Our study indicates that in 
Greenland, and by inference at other ice sheets, this may not always be 
the case and that surface meltwater may be trapped for long periods. 
Supraglacial meltwater may fill some of the 400 or more subglacial 
basins predicted” to lie beneath Greenland. If these basins are filled 
from supraglacial melt, the basal ice in the vicinity of these lakes is 
likely to be warmer" and to havea lower viscosity than expected’’. Basal 
warming combined with sporadic lake floods could modulate and pos- 
sibly accelerate downstream ice flow, with implications for the mass 
balance of Greenland. The storage of surface meltwater in subglacial 
lakes provides a new mechanism for transferring the impacts ofa warm- 
ing atmosphere and increased surface melting to the critical base of the 
ice sheet. 


Online Content Methods, along with any additional Extended Data display items 
and Source Data, are available in the online version of the paper; references unique 
to these sections appear only in the online paper. 
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METHODS 

Ice depths. Ice depths were measured with the University of Kansas Multichannel 
Coherent Radar Depth Sounder (MCoRDS) ice-penetrating radar’! flown by NASA’s 
Operation IceBridge on 26 April 2013 (Extended Data Fig. 1). 

DEMs. WorldView DEMs were derived using the Ames Stereo Pipeline** applied 
to half-metre resolution, along-track satellite stereo image pairs acquired between 
3 May 2012 and 29 March 2014. As few of our DEMs extend over bedrock, we 
cannot vertically co-register these DEMs using fixed ground control points. Instead 
we use the IceBridge Airborne Topographic Mapper (ATM) swath laser altimetry*° 
from 26 April 2013 to correct a WorldView-1 DEM collected on 5 May 2013, nine 
days after the airborne mission. We remove the basin and nearby elevations from 
the ATM and WorldView elevation data and assume that the snow surface eleva- 
tions more than 100 m away from the basin did not change over the interval be- 
tween the ATM and the WorldView acquisitions. Differencing the surface heights 
indicates that the uncorrected satellite DEM is higher than the ATM data by 3.03 
+ 0.25 m. We remove this offset, and then vertically co-register all our other DEMs 
to this ‘corrected’ WorldView DEM, normalizing our DEM heights. 

Features align almost perfectly over multiple satellite images, so we do not hor- 
izontally co-register the DEMs or images. The 90% circular error of probability for 
the half-metre imagery is <4.0m for Worldview-1 and better than 3.5m for 
WorldView-2. 

When differencing DEMs, initial outliers are removed by clipping elevation dif- 
ferences to +200 m. Remaining outliers are removed with an iterative 3-standard- 
deviation filter that finds the mean and standard deviation, clips the elevation 
differences to +3 standard deviations centred on the mean, then re-calculates the 
mean and standard deviation. We stop the iteration when the standard deviation 
changes by less than 2% between iterations (typically 3 or 4 iterations are needed). 
Uncertainties. Standard deviations on the satellite derived DEMs are provided in 
Extended Data Table 1. ATM data*’ have uncertainties that are about +0.1 m (ref. 33). 
When performing calculations we add the ATM uncertainty to the DEM standard 
deviation in quadrature. In the worst case this provides an uncertainty of about 
+0.6 m, which if spread across the entire ~8.4-km” basin equates to a volume 
change of ~5.0 X 10° m’. Additional uncertainty is from elevation changes that 
occur through time at the Flade Isblink ice cap. Elevation changes between 2004 
and 2008 interpolated from ICESat tracks” near the location of the basin are about 
+0.5 m per year. We do not know whether this rate is representative of 2012-2014 
surface elevation changes away from the basin. 

To estimate the correlation length scale for volume uncertainties** we examine 
several pairs of DEMs that extend over ~400 km” of bedrock to the north of the 
study area. Agreement is excellent, with elevation differences typically in the +1 m 


range. We calculate how variance changes with increasing pixel sizes. The variance 
over bedrock pixels becomes independent at a length scale L of 150 m. For each vol- 
ume calculation we use the formula 1.96U/,N, where U is the “volume of uncer- 
tainty”*’, which is the combined DEM uncertainty multiplied by the total area A under 
consideration. N is the number of independent pixels found by dividing A by L’. 
Ice inflow. Ice inflow into the basin is calculated by integrating ice surface eleva- 
tion changes that occur between the lip of the basin out to a distance of 2 km (about 
four ice thicknesses) from the basin. We assume that the elevation falls occurring 
in this region correspond to ice flowing into the basin, although there could also be 
a small component of elevation change caused by deflation of the snow surface and 
melting at the subglacial lake. 

Outliers are removed by clipping initial elevation differences to +200 m. Residual 
outliers are removed with an iterative 6-standard-deviation filter that finds the mean 
and standard deviation, clips the elevation differences to +6 standard deviations 
centred on the mean, then re-calculates the mean and standard deviation. We stop 
the iteration when the standard deviation changes by less than 2% between itera- 
tions (typically 1 or 2 iterations are needed). We choose a more lenient filter outside 
the basin than inside, to allow surface changes of up to 6 m adjacent to the basin 
into the calculation. The much larger area of small changes that occur farther away 
would otherwise dominate the calculation. 

Snowfall. We use the cumulative daily surface mass balances from the MAR num- 
erical weather model'* when available. The cumulative rate is converted to a thick- 
ness using an average snow density of 175 kg m7 * found from MAR for the period 
between 17 May 2012 and 5 May 2013. MAR uncertainties are set at 20%. 

DEM availability. The DEMs used for this work are available at http://www.pgc. 
umn.edu/elevation/stereo/flade. The DEMs are produced using the Ames Stereo 
Pipeline (ASP), which is open source code available from NASA. 
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Extended Data Figure 1 | Operation IceBridge radar profile showing ice 
depths near the southern summit of Flade Isblink ice cap, northeast 
Greenland. Modified from a profile of the Multichannel Coherent Radar 
Depth Sounder (operated by the University of Kansas)*! over the ‘thumb’ basin 
collected on 26 April 2013. The NASA IceBridge flight line proceeds from 
approximately northeast to southwest and is shown in Fig. 1. The purple dotted 
line is the ice surface; the red dotted line is the ice/bed interface. The ‘thumb’ 


basin is at ~33 km along the flight line and shows an ice depth of ~540 m. The 
propagation delay is the time for the radar to travel from the aircraft to the ice 
and back; details of its conversion to depth can be found at the MCoRDS 
technical page: ftp://data.cresis.ku.edu/data/rds/rds_readme.pdf. The dielectric 
constant for ice (e,) used during the conversion from propagation delay to ice 
depth is set to 3.15 m. 
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Surface Basin 
View from north 


Extended Data Figure 2 | Surface collapse basin near the ice divide of Flade Isblink ice cap, northeast Greenland. Photographed from the north by 
M. Studinger, NASA Operation IceBridge, on 26 April 2013. 
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Extended Data Figure 3 | Collapse basin and northward-flowing 
supraglacial stream network near the ice divide of Flade Isblink ice cap, 
northeast Greenland. Supraglacial water disappears into crevasses at the 
edge of the basin. WorldView-2 multispectral image from 14 August 2012. 
Imagery copyright 2012, DigitalGlobe Inc. 
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Extended Data Figure 4 | Elevation changes next to the basin. a, Repeat surface elevation modified from ref. 21. The profile is colour-indexed by time of 
elevation profile S to S’, adjacent to the basin, shown in b. Three to six metres of satellite DEM acquisition (scale on right). Uncertainties in elevation are 
subsidence is seen within a kilometre of the rim of the basin. A crevasse typically less than ~0.5 m. b, Location of profile on WorldView DEM. 
observed in 2012 closes by 2013. The dotted black line is the pre-collapse ice 
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Extended Data Table 1 | Details of satellite along-track stereo pairs 
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Date Image-ID1 Image-ID2 Satellite = Mean Median Standard Number of 
(m) (m) Deviation Samples 
m x10 
03 MAY 2012 102001001ACA2000 102001001AB6D300 Wwvo1 -3.54 -3.64 0.57 5.84 
11 MAY 2012 102001001B090E00 102001001A49ED00 wvo1 +0.04 +0.05 0.58 9.08 
17 MAY 2012 102001001B25B800 102001001B6COB00 Wwvo1 -2.43 -2.45 0.44 18.67 
15 JUN 2012 10300100192B8E00 1030010019346100 Wwv02 -3.11 -3.13 0.41 10.12 
10 JUL 2012 103001001A3C4100 103001001AB78700 wv02 -4.72 -4.72 0.42 10.06 
17 JUL 2012 1030010019553F00 103001001AA24600 wv02 -7.90 -7.91 0.38 9.25 
19 JUL 2012 103001001B7CE000 10300100194C6600 Wwv02 -4.72 -4.73 0.37 9.94 
20 JUL 2012 103001001B318B00 103001001A279800 Wv02 -4.27 -4.27 0.41 8,65 
27 JUL 2012 103001001A401E00 103001001AC27F00 Wwv02 -5.88 -5.88 0.37 9.91 
31 JUL 2012 103001001A738F00 103001001AACOBO0 Wwv02 +2.81 +2.81 0.37 9.06 
07 AUG 2012 103001001BCC9B00 103001001BC8BE00 Wwv02 +1.82 +1.80 0.36 9.89 
14 AUG 2012 103001001B437500 103001001B07CC00 Wwv02 -4.97 -4.98 0.41 10.07 
24 APR 2013 103001002237 7600 10300100212B7B00 Wwv02 +1.70 +1.69 0.33 8.17 
05 MAY 2013* 1030010022CC4100 10300100220CF000 Wwv02 +3.03 +3.03 0.25 0.30 
04 JUN 2013 1030010023C32600 103001002301 7FO0O Wwv02 -1.58 -1.62 0.43 7.18 
24 JUL 2013 1030010024218800 1030010024795500 Wv02 -1.63 -1.65 0.31 18.43 
26 JUL 2013 1030010025C35700 103001002551AB00 Wwv02 -1.77 -1.78 0.26 17.62 
29 JUL 2013 1030010025215900 1030010025966A00 Wwv02 -3.74 -3.77 0.36 16.57 
13 AUG 2013 10300100253F7500 10300100253F7500 Wwv02 -6.51 6.53 0.38 14.88 
31 AUG 2013 1030010026470C00 1030010027CA3A00 Wwv02 -0.66 -0.67 0.32 16.77 
24 MAR 2014 102001002D5FEB00 102001002D4DD100 wvo1 +4.97 +4.95 0.41 13.80 
29 MAR 2014 103001002E6BO0A00 103001002FB4F000 Wwv02 +7.88 +7.89 0.36 8.07 


*The statistics for this DEM are calculated against a NASA Operation IceBridge laser altimetry swath flown over the area on 26 April 2013, nine days earlier®°. 

Identification numbers are DigitalGlobe imagery codes. The images used are from along-track WorldView-1 and -2 half-metre panchromatic stereo imagery (WVO1 and WVO2). Mean, median and the standard 
deviation of elevation differences are computed by subtracting the 5 May 2013 DEM from the DEM in question for distances more than 100 m from the lip of the basin. The 5 May 2013 DEM is corrected toa NASA 
ATM laser altimetry swath collected on 26 April 2013 (ref. 30). The number of 3-m-resolution pixels used in the calculation of mean, median and standard deviation is given in the final column. 
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RBM3 mediates structural plasticity and protective 
effects of cooling in neurodegeneration 


Diego Peretti!, Amandine Bastide', Helois Radford’, Nicholas Verity’, Colin Molloy’, Maria Guerra Martin’, Julie A. Moreno!, 
Joern R. Steinert', Tim Smith', David Dinsdale’, Anne E. Willis' & Giovanna R. Mallucci!? 


In the healthy adult brain synapses are continuously remodelled 
through a process of elimination and formation known as struc- 
tural plasticity’. Reduction in synapse number is a consistent early 
feature of neurodegenerative diseases”’, suggesting deficient com- 
pensatory mechanisms. Although much is known about toxic pro- 
cesses leading to synaptic dysfunction and loss in these disorders””’, 
how synaptic regeneration is affected is unknown. In hibernating 
mamunals, cooling induces loss of synaptic contacts, which are reformed 
on rewarming, a form of structural plasticity**. We have found that 
similar changes occur in artificially cooled laboratory rodents. Cooling 
and hibernation also induce a number of cold-shock proteins in the 
brain, including the RNA binding protein, RBM3 (ref. 6). The rela- 
tionship of such proteins to structural plasticity is unknown. Here 
we show that synapse regeneration is impaired in mouse models of 
neurodegenerative disease, in association with the failure to induce 
RBM3. In both prion-infected and 5XFAD (Alzheimer-type) mice’, 
the capacity to regenerate synapses after cooling declined in parallel 
with the loss of induction of RBM3. Enhanced expression of RBM3 
in the hippocampus prevented this deficit and restored the capacity 
for synapse reassembly after cooling. RBM3 overexpression, achieved 
either by boosting endogenous levels through hypothermia before 
the loss of the RBM3 response or by lentiviral delivery, resulted in 
sustained synaptic protection in 5XFAD mice and throughout the 
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course of prion disease, preventing behavioural deficits and neuronal 
loss and significantly prolonging survival. In contrast, knockdown 
of RBM3 exacerbated synapse loss in both models and accelerated 
disease and prevented the neuroprotective effects of cooling. Thus, 
deficient synapse regeneration, mediated at least in part by failure of 
the RBM3 stress response, contributes to synapse loss throughout the 
course of neurodegenerative disease. The data support enhancing 
cold-shock pathways as potential protective therapies in neuro- 
degenerative disorders. 

We used the phenomenon of physiological structural plasticity seen 
in hibernating mammals to determine the capacity for synapse regen- 
eration in mouse models of neurodegenerative disease. When they enter 
torpor, the neurons of hibernators undergo morphological changes in- 
cluding changes in spine morphology*” and/or changes in connectivity*”. 
These are rapidly reversed on regaining normal body temperature***"°. 
We first established that the phenomenon of synapse dismantling and 
reassembly (structural plasticity) on artificial cooling and rewarming 
occurs in laboratory mice (Fig. la and Extended Data Fig. 1a). We then 
explored the capacity for structural plasticity after cooling in two mouse 
models of neurodegenerative disease: prion disease and the 5XFAD model 
of Alzheimer’s disease’. We used tg37‘/~ mice" infected with Rocky 
Mountain Laboratory (RML) prions used in our previous studies'*"*. 
These mice show substantial synapse loss from 7 weeks post-inoculation 
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Figure 1 | The capacity for synaptic regeneration is lost early in 
neurodegenerative disease. a, Synapse numbers decline on cooling and 
recover on rewarming in wild-type mice, counted in both 3D and 2D. 
Representative electron micrographs (pseudo-coloured for ease of synapse 
identification: yellow, presynaptic; green, postsynaptic compartments) and bar 
charts showing quantification are shown for each experiment (n = 4 animals at 
18°C and n = 2 at 37 °C; 192 images from 2 mice per condition for 3D analyses; 
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93 images from 3 animals per condition, for 2D analyses). A typical tripartite 
synapse is shown at higher magnification. b, c, The same response is seen in 
prion-diseased mice (b) at 4 and 5 w.p.i. but this fails at 6 w.p.i (arrow), and in 
5XFAD (c) mice, where it fails at 3 months (arrow). ***P < 0.0001, **P < 0.01; 
NS, not significant. Student’s t-test; two tailed. All data in bar charts are 
mean + s.e.m. Scale bar, 1 pm. Source Data for all figures can be found in the 
Supplementary Tables. 
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Figure 2 | Failure to induce RBM3 parallels lost capacity for synaptic 


recovery in neurodegenerative disease models. a, Cooling induces increased 
RBM3 levels in hippocampi of wild-type mice. b, c, The response fails at 6 w.p.i. 
in prion-infected mice (b) and at 3 months in 5XFAD mice (c) (arrows). 
Representative western blots are shown. Bar graphs show quantification of 
RBM3 levels relative to GAPDH, (n = 6-11 mice per time point; all 
experiments in triplicate) **P < 0.01, *P < 0.05, Mann-Whitney U-test in 

a and c, Student’s t-test in b. All data are mean + s.e.m. 


(w.p.i.)'*; SXFAD mice have synapse loss from 4 months, after which 
time learning deficits emerge’. We tested the capacity for structural 
plasticity using cooling early in the course of disease, before the onset of 
established synapse loss in both models: from 4 w.p.i. in prion-infected 
animals and from 2 months of age in 5XFAD mice. 
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All mice were cooled to 16-18 °C for 45 min, similar to core tempera- 
tures reached in small hibernators (deep hypothermia) using the bio- 
molecule 5’-adenosine monophosphate (5’-AMP)”*, after which they 
were allowed to slowly rewarm. Animals were euthanized at each stage 
of the cooling-rewarming process and synapses were counted in the 
CA] region of hippocampus. Both synapse density and total synapse 
number significantly declined on cooling, but recovered on rewarming 
in wild-type mice, as measured using both three dimensional (3D)"* and 
two dimensional (2D) analyses’* (Fig. 1a). Neither brain volume nor syn- 
apse size changed on cooling and rewarming, excluding the possibility 
that changes in synapse density reflected changes in these parameters 
(Extended Data Fig. 1a). Thus, wild-type mice showed synaptic struc- 
tural plasticity with reduction in synapse number on cooling and recov- 
ery on rewarming (Fig. 1a). This capacity for plasticity was also seen in 
both prion-infected and 5XFAD mice very early in the course of disease, 
at 4 and 5 w.p.i., and at 2 months of age, respectively (Fig. 1b, c). How- 
ever, this capacity was lost by 6 w.p.i. in prion-diseased mice (Fig. 1b 
and Extended Data Fig. 1b) and at 3 months in 5XFAD mice (Fig. lcand 
Extended Data Fig. 1c). Notably, impaired structural plasticity shortly 
preceded established decline in synapse number seen in prion-infected 
ie mice at 7 w.p.i. (ref. 14), and in the 5XFAD mice from 4 months 
of age (see schematic, Extended Data Fig. 1d). The lost ability to reassem- 
ble synapses was not due to loss of synaptic proteins at this stage (Ex- 
tended Data Fig. 2) nor to increased levels of disease-specific misfolded 
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Figure 3 | Early cooling induces RBM3 overexpression and is 
neuroprotective in prion-infected mice. a, b, Cooling at 3 and 4 w.p.i. resulted 
in sustained high levels of RBM3 in hippocampus for several weeks (n = 3-8 
mice per time point) causing marked protection (b) of synapse number at 7, 8 
and 9 w.p.i. (62 images from 2 mice per time point). Scale bar, 1 jum. ¢, d, Early 
cooling maintained synaptic transmission (n = 4-8 cells from 2 mice per time 
point; representative raw traces of evoked EPSCs are shown) and prevented 
decline in burrowing behaviour and loss of novel object recognition memory 
(d), expressed as ratio of exploratory preference (n = 10 mice per group) in 
contrast to un-cooled mice. e, Haematoxylin and eosin stained sections show 
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striking reduction in hippocampal spongiosis and protection of CA1 neurons 
(bar chart) in cooled mice that is abolished by RBM3 knockdown (n = 4-6 
animals per treatment, except LV-shRNA-RBM3: n = 2). Scale bar, 50 um. 
One-way ANOVA, Brown-Forsythe test with Tukey’s post hoc analysis for 
multiple comparisons was used in d and e. f, Early cooling significantly 
prolonged survival, but this was abolished by knockdown of RBM3 (n = 31 
cooled mice; n = 17 not cooled; n = 10 cooled + RNAi of RBM3). Mann- 
Whitney U-test. *P < 0.05; **P < 0.01; ***P < 0.001. Two-tailed Student’s 
t-test was used unless otherwise stated. All data in bar charts are mean + s.e.m. 
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prion protein (PrP* in prion-infected mice, or of amyloid-f oligomers 
in 5XFAD mice induced by the cooling-rewarming process (Extended 
Data Fig. 3). 

In hibernation and hypothermia, global protein synthesis and cell 
metabolism are downregulated, but low temperature also induces a small 
subset of proteins known as cold-shock proteins that escape translational 
repression’”"*. Amongst these, RNA-binding motif protein 3 (RBM3) 
and cold-inducible RNA binding protein (CIRP, also known as CIRBP) 
are cold-shock proteins expressed at high levels in brain®’’. We found 
strong induction of RBM3 by cooling in brains of wild-type mice and in 
mice with prion disease at 4 w.p.i.and 5XFAD mice at 2 months (Fig. 2). 
CIRP was not upregulated (Extended Data Fig. 4). However, both prion 
and 5XFAD mice lost the capacity to upregulate RBM3 after cooling at 
6 w.p.i. (Fig. 2b) and at 3 months of age (Fig. 2c), respectively, in parallel 
with the lost ability to reassemble synapses after cooling at these time 
points (Fig. 1b, c). Therefore, we asked if induction of RBM3 expression 
drives synaptic recovery. 

Therapeutic hypothermia is a powerful neuroprotectant in brain injury 
acting through multiple mechanisms, including enhanced gene express- 
ion driving regenerative processes enhancing synapse formation (see 
ref. 17 for a review). RBM3 has been implicated in protection against 
cell death in various in vitro models of cooling and neuroprotection”, 
albeit in conditions of mild hypothermia (~32 °C). It is known to in- 
crease local protein synthesis at dendrites’ and global protein synthesis 
through ribosomal subunit binding and/or microRNA biogenesis”. The 
neuroprotective effects of hypothermia on neurodegenerative disease 
are unknown, however. Given that the capacity for structural plasticity 
correlated with induction of RBM3, we asked if raising endogenous RBM3 
levels through early therapeutic cooling would restore failed synaptic plas- 
ticity. In wild-type mice, a single episode of cooling to 16-18 °C raised 


RBM3 levels in brain for up to 3 days (see Fig. 2a and Extended Data 
Fig. 4c), suggesting the response is sustained for some time after the cold 
stress. Animals were cooled twice: at 3 w.p.i. and again at 4 w.p.i., resulting 
in a sustained several-fold increase in RBM3 expression up to 6 weeks 
later, declining to baseline levels at 12 w.p.i., at the terminal stage of dis- 
ease (Fig. 3a and Extended Data Fig. 5). Control mice were infected with 
prions but were not cooled. Early cooling and associated increased RBM3 
expression protected against synapse loss in prion disease at 7, 8 and 
9 w.p.i. (Fig. 3b), restored synaptic transmission (Fig. 3c) and prevented 
behavioural deficits, maintaining burrowing behaviours and novel object 
recognition memory (Fig. 3d and Extended Data Fig. 6a). There was 
also marked neuronal protection in the hippocampus (Fig. 3e, compare 
subpanels ii and iii), even in mice succumbing to prion infection, which 
is ultimately overwhelming due to other toxic effects'*. Most remark- 
ably, early cooling significantly increased survival in prion-infected 
mice (91 + 7 days in cooled mice vs 84 + 4days for uncooled mice; 
P= 0.0002). Indeed, one animal survived 117 days post-infection, 
nearly a 50% increase in life expectancy (Fig. 3f). Mice cooled later 
in prion disease, at 5 and 6 w.p.i., when the RBM3 induction response 
is lost (see Fig. 2b), did not show increased survival (Extended Data 
Fig. 7). As predicted, RBM3 knockdown by lentivirally mediated RNA 
interference (RNAi) in the hippocampus abolished the protective effects 
of early cooling on CA1 pyramidal neurons and spongiform change 
(Fig. 3e, subpanel iv), on object recognition memory (Extended Data 
Fig. 6b, c), and on survival (Fig. 3f). As before, misfolded PrP levels 
were not affected by cooling (Extended Data Fig. 8). In therapeutic human 
hypothermia, temperatures of ~34 °C are used, similar to those of hibern- 
ating large mammals such as bears, which are known to induce similar 
transcriptional changes in RBM3 (ref. 6). Therefore, these physiological 
changes in small rodents at 16-18 °C may well be relevant in therapeutic 
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3 months of age, respectively (93 images from 3 
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human hypothermia”*. Cooling of mice to higher core temperature of 
26-28 °C, was similarly protective in prion disease, extending survival 
(Extended Data Fig. 9). 

We next asked if RBM3 overexpression alone, in the absence of cool- 
ing, was similarly neuroprotective. We overexpressed, or knocked-down, 
RBM3 in both hippocampi of mice by stereotaxic injection of lenti- 
viruses LV-RBM3 and LV-shRNA-RBM3, respectively. LV-RBM3 pro- 
duced a threefold increase in RBM3 levels compared to controls up 
to 8 weeks post-injection; whereas knockdown by LV-shRNA-RBM3 
reduced RBM3 levels to 30% of control levels (Fig. 4a). LV-RBM3 treat- 
ment, but not LV-control, rescued the early deficit in synapse reassembly 
in both prion-infected and 5XFAD mice at 6 w.p.i. and 3 months, re- 
spectively (Fig. 4b). Furthermore, LV-RBM3 was associated with marked, 
sustained neuroprotective effects in prion-infected mice: preventing 
synapse loss (Fig. 4c), synaptic transmission decline (Fig. 4d) and mem- 
ory and behavioural impairments (Fig. 4e). In vitro, RBM3 has been 
shown to promote translation”. Increased global protein synthesis 
rates are profoundly neuroprotective, rescuing synapse number in prion 
disease’*"*, We found that RBM3 overexpression rescued levels of glo- 
bal translation, whereas RBM3 knockdown further reduced them, in 
prion-infected mice at 9 w.p.i. (Fig. 4f) suggesting that this action of 
RBM3 along with preferential translation of specific RBM3-bound 
mRNAs, contributes to the synapse regeneration process. LV-RBM3 
treatment reduced prion neuropathology and prevented neuronal loss 
(compare subpanels ii and iii in Fig. 4g) and significantly extended sur- 
vival of prion-infected animals (Fig. 4h). This was not associated with 
changes in levels of PrP**, which were not affected by overexpression of 
RBM3 (Extended Data Fig. 8). Knockdown of RBM3, in contrast, ac- 
celerated synapse loss and memory and behavioural deficits (Fig. 4c-e), 
accelerating neuronal loss (Fig. 4g, subpanel iv) and significantly short- 
ening survival (Fig. 4h). 5XFAD mice do not allow similar examination 
of long-term effects of RBM3 overexpression as evolution of deficits and 
neuronal loss takes many months, and life expectancy is normal. How- 
ever, RNAi of RBM3 accelerated onset of synaptic loss in 5XFAD mice, 
which was now seen at 3 months (Extended Data Fig. 10a), suggesting 
that RBM3 has a long-term protective role in structural plasticity in these 
mice also. RBM3 knockdown also reduced synapse number and novel 
object memory in wild-type mice (Extended Data Fig. 10b), thus it is 
likely to be involved in synaptic maintenance under normal physio- 
logical conditions. 

In conclusion, we have shown that early synapse loss in mouse mod- 
els of neurodegenerative disease results, at least in part, from defective 
synaptic repair processes associated with failure to induce the cold- 
shock RNA-binding protein, RBM3. This results in impaired synaptic 
reassembly after cooling, but also appears to be important in the con- 
text of protecting against ongoing synaptic toxicity during disease, and 
in synaptic maintenance in wild-type mice. Our data suggest that fur- 
ther understanding the mechanisms of action of cold-shock proteins 
such as RBM3 may yield insights into endogenous repair processes and 
bring new therapeutic targets for neuroprotection in neurodegenera- 
tive disease. 


Online Content Methods, along with any additional Extended Data display items 
and Source Data, are available in the online version of the paper; references unique 
to these sections appear only in the online paper. 
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METHODS 


Animals. All animal work conformed to UK regulations and institutional guide- 
lines and were performed under UK Home Office guidelines. 

Prion infection of mice. The 3-week-old tg37*/~ mice! were inoculated intra- 
cerebrally into the right parietal lobe with 30 11 of 1% brain homogenate of Chandler/ 
RML (Rocky Mountain Laboratories) prions under general anaesthetic, as described. 
Animals were culled when they developed clinical signs of scrapie as defined in™. 
Control mice received 1% normal brain homogenate. 

5XFAD mice. Founder 5XFAD mice were obtained from the Jackson Laboratory 
(Ba Harbour, ME, USA). The 5XFAD mice have the following five mutations: Swedish 
(K670N and M671L), Florida (I716V) and London (V717]) inhuman APP695 and 
human PS1 cDNA (M146L and L286V) under the transcriptional control of the 
neuron-specific mouse Thy-1 promoter’. Colonies were maintained by crossing hemi- 
zygous transgenic to wild-type littermates. 

Induction of hypothermia. FVB, tg37‘/~ and 5XFAD mice weighing = 20 g were 
cooled using 5'-AMP as described’>”, with slight modifications. Freshly prepared 
5'-AMP (Sigma) was injected intraperitoneally (0.7 mg per g). Control mice were 
injected with saline. Mice were maintained at room temperature until core body 
temperature decreased to 25 °C (approximately 60 min). Subsequently, mice were 
transferred to a refrigerator (5 °C) and core body temperature lowered to 16-18 °C 
for 45 min. Mice recovered normal body temperature at room temperature con- 
ditions. Cooled samples were collected at the end of the 16-18 °C period and rewarmed 
samples as stated elsewhere in the text. 

Electron microscopy data acquisition and analysis of synapse number. Male 
mice were used to avoid the effects on synapse number of the oestrus cycle. Brains 
were perfusion fixed with 2% glutaraldehyde + 2% paraformaldehyde in 0.1 M 
sodium cacodylate buffer (final pH 7.3). Slices (300 jum) were prepared using a 
vibrating blade microtome (Leica Microsystems, Milton Keynes, UK). These slices 
were post-fixed in 1% osmium tetroxide + 1% potassium ferrocyanide, stained en- 
bloc with 5% uranyl acetate and embedded in epoxy resin (TAAB Laboratories 
Equipment Ltd, Aldermaston, UK), as described”®. 

For random sampling and calculation of synapse density, estimation of total 
synapse number and measures of volume of tissue using the stereology disector 
method, the following procedures were used, as described previously’®. Fixed and 
stained slices were flat embedded as described’’. The slices were mounted for microt- 
omy, a semi-thin (1 tm) section was cut from the upper surface of each slice and 
the area of the stratum radiatum (sr) measured using the Cavalieri estimator (points 
grid), as described’*. These results were then used to estimate the volume of the 
stratum radiatum and the systematic random selection of regions for electron micro- 
scopy, as described". Briefly, the grid has intersections every 50 pixels (at a scale 
that 1 pixel is 1 um). Each intersection was labelled with a dot. The dots were 
numbered in each of the sections generated per hippocampus. These numbers 
were used for measuring the area of each section and generate the data about the 
volume applying the following formula: VsrCA1 = =PsrCA1 X Ap X T (ref. 16). 
The total number of dots per hippocampus was divided by 6 (number of samples 
selected to obtain representative synapse variability from the whole hippocampus). 
The number obtained was used for the generation of a random number for sam- 
pling the first position and defined the interval for the subsequent 5 positions to be 
sampled. These regions were trimmed-down, mounted on aluminium pins and 
imaged by serial block-face scanning electron microscopy in a FEI Quanta FEG 
250 electron microscope (FEI Ltd, Cambridge, UK) equipped with a “3View2XP’ 
system (Gatan Ltd, Abingdon, UK). Images of 32 serial sections were generated in 
each of the six points. Serial sections of areas 51.4 uum? were recorded at 20,000 
and an accelerating voltage of 3 kV, using spot size 3.0, at a working distance of 
6.1 mm, and a dwell-time of 5 pts per pixel. The mean section thickness was esti- 
mated by sectioning only half of the original block face. The block was then re- 
embedded, sectioned orthogonally and the depth removed by ‘3View was measured 
by transmission electron microscopy’. The mean thickness of 3 View sections was 
86 nm. We analysed synapses in a volume stack of 88 jum? (with an area of 33 um’, 
a measured section thickness of 0.086 jum and 31 sections). Synapses that had their 
first identifiable profile below the first section in the series were counted'*. Synapses 
were identified, within a counting frame of 5.75 um X 5.75 um, which follows the 
counting frame rules to avoid edge effect for the estimation of synapse numerical 
density. The total number of synapse was estimated with the synapse density and 
the volume of each hippocampus, as described’*. 

Synapse size was calculated by measuring mean synapse length, mean synapse 
area in the same serial sections used for estimations of synapse density and total 
synapse number, as described’. 

For routine 2D analyses semi-thin (1 tm) sections were stained with toluidine 
blue and examined to select areas for ultramicrotomy. Ultrathin sections (~70 nm) 
were stained with lead citrate and examined, blind, in a Jeol 100-CXII electron 
microscope (JEOL (UK) Ltd, Welwyn Garden City, UK) equipped with a “Megaview 
II? digital camera (Olympus Soft Imaging Solutions GmbH, Miinster, Germany). 


A series of images were recorded from the stratum radiatum, all at a distance of 
approximately 100 pm from the CA1 pyramidal layer to avoid the large dendritic 
profiles in the proximal area. 31 images, each encompassing an area of 55 um’, 
from each of two to three mice were used for scoring. For synapse quantification 
the following criteria were followed: the presence of an unambiguous postsynaptic 
density, a clear synaptic cleft, and three or more synaptic vesicles. An average of 
300 synapses were counted per sample. 
Quantifiation of numbers of neurons in CA1 region of hippocampus. For CA1 
pyramidal layer volume analysis, whole hippocampus was cut and every 250 1m 
stained with haemotoxylin and eosin in fixed sections of 5m thickness. The 
volume of the pyramidal layer was measured using the Cavalieri estimator (points 
grid). For neuron mean density slices were stained with NeuN and calculated 
within an unbiased virtual space. The total number of neurons in CA1 pyramidal 
layer was estimated with the neuron density and the volume of the hippocampus. 
Immunoblotting. Protein samples were isolated from hippocampi using protein 
lysis buffer (50 mM Tris, 150 mM NaCl,1% Triton X-100, 1% Na deoxycholate, 
0.1% SDS and 125 mM sucrose) supplemented with Phos-STOP and protease inhi- 
bitors (Complete, Roche), followed by centrifugation and quantification. Protein 
levels were determined by resolving 20 1g of protein on SDS-polyacrylamide gel 
electrophoresis gels, transferred onto either nitrocellulose or PDVF membranes 
and incubated with primary antibodies. Synaptic proteins were detected using the 
following antibodies: SNAP-25, (1:10,000; catalogue number: ab5666, Abcam), 
VAMP2 (1:5,000; catalogue number: 104204, Synaptic Systems), NMDA-R1 (1:1,000; 
catalogue number: G8913, Sigma) and PSD95 (1:1,000; catalogue number: 04-1066, 
Millipore). Odyssey IRDye800 secondary antibodies (1:5,000; catalogue number: 
926-32210/32211 LI-COR) were applied, visualized and quantified using Odyssey 
infrared imager (LI-COR; software version 3.0). Protein for PrP levels was deter- 
mined using the primary antibody ICSM35 (1:10,000; catalogue number: 0130- 
03501, D-GEN). PrP*‘ was detected after Proteinase K digestion. Cold-shock protein 
levels were determined with antibodies CIRP (1:1,000; catalogue number: 10209-2- 
AP, Proteintech Group, Inc.) and RBM3 (1:500; catalogue number: 14363-1-AP, 
Proteintech Group, Inc.). Amyloid- levels in 5XFAD mice were detected by 6E10 
clone antibody (1:1,000; catalogue number: SIG-39320, Covanche). Horseradish- 
peroxidase-conjugated secondary antibodies (1:10,000; DAKO) were applied and 
protein visualized using enhanced chemiluminescence (GE Healthcare) and quan- 
tified using ImageJ. An antibody against GAPDH (1:5,000; catalogue number: 
sc32233, Santa Cruz) was used to determine gel loading. 
Lentiviruses. GenTarget (San Diego, CA, USA) generated lentiviral plasmids. The 
neuron-specific promoter CAMKII was used to drive RBM3; the H1 promoter was 
used for shRNA-RBM3 expression and scrambled sequence-shRNA. Viruses were 
injected stereotaxically into the CA1 region of the hippocampus as described”. 

Mouse RBM3 isoform 2 (NM_001166410.1) overexpression was induced using 
the pLentiCAMKII (RBM3)Rsv (GFPBsd) plasmid. pLentiCAMKII (empty)Rsv 
(GFPBsd) was used as control. RBM3 down regulation was achieved by using 
pLentiH1shRNA(m RBM3) sequence number 2Rsv(GFPBsd). This plasmid con- 
tains the following shRNA-RBM3 sense, anti-sense and loop sequences (sequence 
number 2: 5’-GTTGATCATGCAGGAAAGTCTcgagA GACTTTCCTGCATGA 
TCAAC-3’). 

pLentyH1shRNA (negative control)Rsv (GFPBsd) containing the sequence 5'- 
GTCTCCACGCGCAGTACATTT-3’ was used as control. Lentiviral sequences and 
viral stocks were generated by GenTarget (San Diego, CA, USA). Virus titre was 
determined using FACS (BD FACS Calibur). Viruses were used with a final titre of 
0.6 X 10° to 1.5 X 10° transducing units. 
Stereotaxic injection. Under general anaesthetic, mice were injected with 5 1l of 
lentivirus per site into the CA1 region of the hippocampus. Mice were injected at 2 
locations per hemisphere; at —-2 mm and —2.7 mm posterior, + 2 mm lateral and 
—2.2mm ventral relative to bregma, using a 26 s-gauge needle and Hamilton 
syringe as described”. 
Burrowing assay. This was performed as described'*”’. Briefly, mice were placed 
in individual large plastic cages containing a clear Perspex tube, 20 cm long X 6.8 cm 
diameter, filled with 140 g of normal food pellets. The weight of pellets remaining 
in the tube was measured after 2 h and the percentage burrowed calculated. Behavioural 
data were analysed using one-way ANOVA with Brown-Forsythe test and Tukey’s 
post hoc test. For behavioural testing no formal randomization was needed or used. 
Experimenter was blind to group allocation during all experiments and when 
assessing outcome. 
Novel object recognition memory. This was performed as described’’. Briefly, 
mice were tested in a black cylindrical arena (69 cm diameter) mounted with a 100 
LED strip infrared light source and a high resolution day/night video camera (Sony). 
Mice were acclimatized to the arena 5 days before testing. During the learning phase, 
two identical objects were placed 15 cm from the sides of the arena. Each mouse 
was placed in the arena by an operator blind to the experimental group for two 
blocks of 10 min for exploration of the objects with an inter-trial interval of 10 min. 


©2015 Macmillan Publishers Limited. All rights reserved 


Two hours later, one of the objects was exchanged for a novel one, and the mouse 
was replaced in the arena for 5 min (test phase). The amount of time spent explor- 
ing all objects was tracked and measured for each animal using Ethovision software 
(Tracksys). All objects and the arena were cleansed thoroughly between trials to 
ensure the absence of olfactory cues. The amount of time spent exploring the novel 
object over the familiar object is expressed as a ratio, where a ratio of 1 reflects ran- 
dom exploration, and >1 reflects memory. Behavioural data were analysed using 
one-way ANOVA with Brown-Forsythe test and Tukey’s post hoc test. For beha- 
vioural testing no formal randomization was needed or used. Experimenter was 
blind to group allocation during all experiments and when assessing outcome. 
Electrophysiology. Whole-cell recordings were made in acute hippocampal slices 
to measure synaptic transmission from identified CA1 neurons and recording per- 
formed as described’. In brief, neurons were voltage clamped using a Multiclamp 
700B amplifier and pClamp 10.3 software (Molecular Devices) and EPSCs were 
evoked by stimulation with bipolar platinum electrode at 37 °C. Pipettes (2.5-3.5 MQ) 
were filled with a solution containing (in mM): KCl 110, HEPES 40, EGTA 0.2, 
MgCl, 1, CaCl, 0.1; pH was adjusted to 7.2 with KOH. Neurons were visualized 
with X60 objective lenses on a Nikon FS600 microscope fitted with differential 
interference contrast optics. Four to eight cells were measured per mouse in at least 
two animals per experiment. Male mice were used to avoid effects of the oestrus 
cycle. 

Hippocampal slice preparation and *°S-methionine labelling. Slices were dis- 
sected in an oxygenated cold (2-5 °C) sucrose artificial cerebrospinal fluid (ACSF) 
containing (mm): 26mM NaHCOs, 2.5mM KCl, 4mM MgCh, 0.1 mM CaCl, and 
250mM sucrose. Hippocampal slices were prepared using a tissue chopper (Mcllwain). 
Slices were allowed to recover in normal ACSF buffer while being oxygenated at 
37°C for 1h, then incubated with [*°S]-methionine label for 1h, then homoge- 
nized’, Proteins were TCA precipitated and incorporation of radiolabel was mea- 
sured by scintillation counting (Winspectal, Wallac). 

Statistics. Statistical analyses were performed using Prism v5 software, using Student’s 
t-test for data sets with normal distribution and a single intervention; when the 
F-test to compare variances was significant, Mann-Whitney U-test was performed 
instead. 
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Behavioural data, neuronal counts and *°S-met were analysed using one-way 
ANOVA and Tukey’s post hoc test for multiple variables. 

For behavioural testing no formal randomization was needed or used. Experimenter 
was blind to group allocation during the experiments and when assessing outcome. 
Statistical analyses for in vivo experiments. Sample size estimation for induction 
of hypothermia for volume, synaptic density and estimation of total number of synapses 
was based in effect size of the preliminary experiment on dissector method (5.9511) 
and obtained with the free software G*Power version 3.1.9.2. 

The software prediction shows that with a sample size of 6 animals for 2 con- 
ditions (control and cooled or cooled and rewarmed), the experiment has a 99.8% 
of chance of detecting a difference and avoid a type II error (-error), with a 0.05% 
chance of a type I («-error). Sample size estimation for novel object recognition 
experiment was established based on the effect size of 1.6161 from control and 
prion mice at 8 w.p.i. This parameter was applied in the following F tests calcula- 
tion of power analysis with G*Power version 3.1.9.2. 

Similar analyses were performed for burrowing tests. 
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Extended Data Figure 1 | Stereological assessment of volume and synapse 
size to validate 2D assumption-based approaches for counting synapse 
density. a, CA1 volume and synapse mean length and area in the stratum 
radiatum remain essentially unchanged on cooling and rewarming in wild-type 
mice. Volume was measured using disector principle and synapse mean length 
and area determined in the same sections, as described', n values as reported 
for Fig. 1a. b, c, Representative electron micrographs (pseudo-coloured for 


ease of synapse identification) for data not shown in Fig. 1b, c, from prion- 
infected mice at 4 and 6 w.p.i. (b) and for 5XFAD mice at 2 and 3 months 
(c) before cooling (black framed images) and cooled (blue framed images). 

d, Schematic showing lost capacity for structural plasticity precedes synapse 
loss and neuronal loss in both mouse models. Scale bar, 1 pm. All data in bar 
charts are mean = s.e.m. Student’s t-test, two tailed. Non-significant P values. 
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Extended Data Figure 2 | Synaptic protein levels during cooling- are shown for 3 mice per temperature and time point. Bar graphs show 
rewarming in prion and 5XFAD mice. a, Levels of presynaptic (SNAP25, quantification of synaptic protein levels relative to GAPDH. All data represent 
VAMP2) and postsynaptic (PSD95, NR1) proteins do not change before means + s.e.m. (1 = 3-11 mice per time point). Student's t-test, two tailed. 


(black bars) and after cooling to 16-18 °C (blue bars) in prion-infected mice at _n.s. = non-significant P values. 
4 and 6 w.p.i. b, SXFAD mice at 2 and 3 months. Representative western blots 
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Extended Data Figure 3 | Cooling does not induce changes in PrP* or 
amyloid-f levels. a, Levels of total PrP (upper blot) and PrP* (lower blot) do 
not change notably before (white line), during (blue line) or after (red line) 
cooling to 16-18 °C in prion-infected mice. PrP* is detected after digestion 
with proteinase K. Levels are undetectable by western bloting at 6 w.p.i., as 


& 
ee 5x FAD 3 months 
kDa Se me ee ee a BF 
15 — -m 
10 — 
Ab 


e <n = 


expected. b, Cooling does not change levels of amyloid-f oligomers in 5XFAD 
mice, arrow indicates amyloid-B monomers (lane 1, synthetic amyloid-p 
oligomers; last lane, one-year-old 5XFAD control (C+)). Representative 
western blots are shown for 3 mice per temperature and time point. 
Non-significant P values. 
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Extended Data Figure 4 | Cooling induces sustained increase in RBM3 t-test, two tailed. n.s. = non-significant P values. c, Increased levels of RBM3 are 


levels but not in CIRP. a, b, Levels of CIRP do not change after cooling in sustained for at least 72 h after cooling in wild-type mice. Bar graph shows 
prion-infected mice at 4 and 6 w.p.i. (a) or in 5XFAD mice at 2 and3 months quantification of RMB3 against GAPDH in control (white bar), cooled (blue 
(b). Representative western blots are shown for 3 mice per temperature and bar), and 12, 48 and 72h recovery after cooling (red bars). All data represent 
time point. Bar graphs show quantification of CIRP levels relativetoGAPDH. means ~ s.e.m., (n = 3-6 mice per time points, *P < 0.05, Mann-Whitney 
All data represent means + s.e.m. (n = 6-9 mice per time point). Student’s U-test, two tailed). 
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Extended Data Figure 5 | Early cooling induces sustained elevation of levels remained high up to 6 weeks later and declined at 12 w.p.i. Representative 
RBM3 levels. RBM3 levels remain high after cooling to 16-18 °C in prion- western blots are shown for 3 mice per time point. 


infected mice (magenta boxes) compared to control prion-infected mice. These 
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Extended Data Figure 6 | Exploration time in exposure phase of novel 
object testing is normal in all groups and RBM3 knockdown abolishes 
improved memory after cooling. a, Exploratory behaviour measured in 
seconds is not different in mice with early cooling from prion-diseased mice 
and is not affected by the duration of disease (n as reported in Fig. 3d). 

b, c, Lentivirally mediated RNAi of RBM3 eliminates the protective effect of 
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cooling on novel object memory impairment in prion disease (b) (dark green 
bar); but does not affect exploratory behaviour in training phase (c). All data 
represent means + s.e.m. Data analysed using one way ANOVA, Brown- 
Forsythe test with Tukey’s post hoc analysis for multiple comparisons 

(n = 11-16 mice per time point, **P < 0.01). 
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Extended Data Figure 7 | Induction of hypothermia at time point when not increase survival in prion-infected mice. Kaplan-Meier survival plots for 


RBM3 induction fails is not neuroprotective. Cooling at 5 and 6 w.p.i., when prion-infected mice (black line, no cooling; n = 10; orange line, mice cooled at 
synaptic plasticity and RBM3 induction fails (see Fig. 1 and 2, main text), does 5 and 6 w.p.i., n = 16). Student’s t-test, two tailed. Non-significant P values. 
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following treatment with LV-RBM3 (dark green) and LV-shRNA-RBM3 uninfected control mouse. 
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Extended Data Figure 9 | Mild hypothermia also extends survival in prion-infected mice. Kaplan-Meier plot showing that cooling to 26 °C at an early stage 
also significantly lengthens survival (n = 27 cooled vs n = 16 non-cooled mice); **P < 0.01, Student’s t-test, two tailed. 
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Extended Data Figure 10 | RNAi of RBM3 downregulation accelerates reduction in synapse number by RNAi of RBM3 (n = 82-93 images from 3 
impaired structural synaptic plasticity in the 5XFAD mouse model, andalso _mice per time point, Student's t-test, two tailed). b, RBM3 knockdown reduces 
reduced synapse number and function in wild-type mice. a, Impaired synapse number and novel object memory in wild-type mice (n = 93 images 
structural synaptic plasticity after cooling occurs in shRNA-RBM3 treated from 3 mice per time point, Student’s t-test, two tailed ***P < 0.0001; for novel 


5XFAD mice at 3 months. Representative electron micrographs are shown and __ object recognition task n = 11 mice, LV-shRNA-control and 10 mice, 
are pseudo-coloured as in main text figures. Quantification shows significant | LV-shRNA-RBM3, Mann-Whitney U-test, *P < 0.05). Scale bar, 1 jim. 
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Broad and deep tumour genome sequencing has shed new light on 
tumour heterogeneity and provided important insights into the evolu- 
tion of metastases arising from different clones’. There is an addi- 
tional layer of complexity, in that tumour evolution may be influenced 
by selective pressure provided by therapy, in a similar fashion to that 
occurring in infectious diseases. Here we studied tumour genomic 
evolution in a patient (index patient) with metastatic breast cancer bear- 
ing an activating PIK3CA (phosphatidylinositol-4,5-bisphosphate 
3-kinase, catalytic subunit alpha, PI(3)Ka) mutation. The patient was 
treated with the PI(3)Ka inhibitor BYL719, which achieved a lasting 
clinical response, but the patient eventually became resistant to this 
drug (emergence of lung metastases) and died shortly thereafter. A 
rapid autopsy was performed and material from a total of 14 metas- 
tatic sites was collected and sequenced. All metastatic lesions, when 
compared to the pre-treatment tumour, had a copy loss of PTEN (phos- 
phatase and tensin homolog) and those lesions that became refract- 
ory to BYL719 had additional and different PTEN genetic alterations, 
resulting in the loss of PTEN expression. To put these results in con- 
text, we examined six other patients also treated with BYL719. Acquired 
bi-allelic loss of PTEN was found in one of these patients, whereas in 
two others PIK3CA mutations present in the primary tumour were 
no longer detected at the time of progression. To characterize our 
findings functionally, we examined the effects of PTEN knockdown 
in several preclinical models (both in cell lines intrinsically sensitive 
to BYL719 and in PTEN-null xenografts derived from our index 
patient), which we found resulted in resistance to BYL719, whereas 
simultaneous PI(3)K p110f blockade reverted this resistance pheno- 
type. We conclude that parallel genetic evolution of separate metastatic 
sites with different PTEN genomic alterations leads to a convergent 
PTEN-null phenotype resistant to PI(3)Ka inhibition. 

Weare currently engaged in testing the antitumour activity of a novel 
PI(3)Ka inhibitor, BYL719, in patients with tumours harbouring acti- 
vating PI(3)Ka mutations’. The PI(3)K pathway is essential for cell 
growth, proliferation, survival, and metabolism*?. The PI(3)K family of 
enzymes is divided into three main classes (I to III), with class I being the 
most often implicated in human cancer®. Class IA PI(3)K is a hetero- 
dimer composed of a catalytic subunit (p110«, B or 5) and a regulatory 
subunit”*. PIK3CA, the gene encoding p110«, is mutated in up to 40% 
of oestrogen receptor (ER) and/or HER2 positive breast tumours””®. In 
our ongoing phase I clinical study of BYL719, we have observed clinical 
responses in breast, head and neck and other tumours’, providing proof 


of principle that PI(3)Ka targeting is active against tumours harbour- 
ing PIK3CA mutation. 

We present the case of a 60-year-old breast cancer patient (index patient) 
diagnosed with invasive ductal carcinoma who underwent surgery followed 
by adjuvant treatment with chemotherapy and the aromatase inhibitor 
examestane. Four years later, the patient developed bone metastases 
and started therapy with the ER antagonist fulvestrant, achieving stable 
disease. After 18 months on therapy, her disease progressed in the liver, 
bone and lymph nodes. The archival tissue of the primary tumour was 
subjected to PCR-based genetic analysis'’ and a hot spot mutation in 
PIK3CA (E542K) was detected. This finding led to the patient’s enrol- 
ment in a phase I clinical trial testing the tolerability and antitumour 
activity of BYL719 (NCT01219699). The patient rapidly achieved a con- 
firmed partial response according to the RECIST 1.0 criteria’? that lasted 
9.5 months (Table 1 and Extended Data Fig. 1). At that point, while the 
tumour remained stable in multiple sites including a peri-aortic lymph 
node location, progression occurred in the lungs (Fig. 1) and consequently 
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Figure 1 | Clinical response of index patient treated with BYL719. CT scans 
showing stable (responding) peri-aortic lymph node metastasis (yellow circles, 
left column) and the appearance of new lung metastatic lesions (yellow circles, 
right column) after the completion of the tenth cycle of BYL719 therapy. 
Arrow, pleural effusion. 
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Figure 2 | Loss of PTEN upon BYL719 resistance. a, Circos plots from WGS 
analysis of primary tumour (before BYL719 treatment) and a lung metastasis 
appearing after the tenth cycle of BYL719 therapy. b, Copy number variation 
of chromosome 10. c, WES of the peri-aortic lymph node showing durable 
stable disease during BYL719 therapy compared to both primary tumour 
and the progressing lung lesion. The diagram shows the variant allele 
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fractions (VAF) of the listed gene mutations in the three lesions. The estimated 
tumour purities are 44% for the breast primary tumour, 50% for the lung 
metastasis, and 59% for the lymph node metastasis. d, PTEN IHC of primary 
tumour, peri-aortic lymph node, and lung metastasis. Images were taken from 
Servier Medical Art (licensed under a Creative Commons Attribution 3.0 
Unported License). 
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Table 1 | Patient information 


Patient — Site PIK3CA Dose Response DOT PTEN PIK3CA 
baseline (mg) (RECIST) (days) Post-T  Post-T 

Index Breast E542K 400 PR(—52.4%) 285 Loss E542 

1 Breast H1047R 400 SD* 181 Unch. H1047R 

2 Breast H1047R 400 SD (—26.3%) 424 Loss H1047R 

3 Breast H1047R 400 SD (—28.5%) 179 Unch. WT 

4 Breast H1047L 400 SD (—11.3%) 504 Unch. H1047L 

5 Breast H1047R 400 SD (—24.9%) 110 Unch. H1047R 

6 Salivary E545K 400 SD (-17%) 112 Unch. WT 


RECIST, Response Evaluation Criteria In Solid Tumors; DOT, duration of treatment; PR, partial 
response; SD, stable disease; Unch., unchanged; WT, wild-type; Amp., amplified; Post-T, 
post-treatment. 

* With interval decrease in left breast and left chest wall skin thickening. 


therapy with BYL719 was discontinued. The clinical status of the patient 
deteriorated rapidly and she died two months after termination of the 
BYL719 treatment. A rapid autopsy was performed three hours after 
death and a total of 14 metastases with tumour cells present were iden- 
tified and collected for sequencing (Extended Data Table 1). 

In order to proceed systematically to identify possible genetic deter- 
minants of acquired resistance to PI(3)Ko inhibition, we took a three- 
step approach. First, we examined both the primary tumour (before 
BYL719 treatment) and the new lung metastasis by whole genome se- 
quencing (WGS). Although both samples shared many somatic genetic 
aberrations (Fig. 2a and Extended Data Fig. 2), PTEN copy number loss 
was detected only in the lung metastasis (Fig. 2b). Second, we analysed 
by whole exome sequencing (WES) the primary tumour, lung meta- 
stasis, and the peri-aortic lesion that remained stable (responding) at 
the time of progression to BYL719 therapy (Fig. 2c). This analysis revealed 
that both peri-aortic and lung lesions harboured mutations in PIK3CA, 
ESR1 and BRCA2, and single copy loss of PTEN. Importantly, in addi- 
tion to the PTEN copy number loss, we identified a PTEN del339FS 


(frameshift) mutation only in the lung metastasis (Fig. 2c). By immu- 
nohistochemistry (IHC), we observed that PTEN protein expression was 
lost in the lung metastasis but was present in both the primary tumour 
and peri-aortic lesion (Fig. 2d). 

Third, to confirm and expand our findings, we sequenced the prim- 
ary tumour and all the metastatic lesions to >500-fold coverage using 
a custom targeted deep-sequencing assay, IMPACT’*"* (Methods). A 
number of mutations were shared by the primary tumour and the metas- 
tatic sites, whereas others were observed only in all or in selected meta- 
static lesions (Fig. 3a and Supplementary Table). We confirmed that 
the PIK3CA E542K mutation in the primary tumour was conserved in 
the metastatic samples and detected the presence of another PIK3CA 
mutation (D725G). Moreover, we found increased copy number of FGFR1 
and EI4EBP1 in all tumour samples, consistent with the relatively fre- 
quent 8p11-12 amplification described in breast cancer'*"°. ESR1 Y537N 
and BRCA2 L971S alterations were present in all the metastatic lesions 
but not in the primary tumour. We speculate that the ESR1 Y537N mu- 
tation, reported to promote ligand-independent ER activation’’, was 
selected upon anti-oestrogen therapy received by the patient before 
BYL719 treatment. 

Central to our work, all metastatic lesions appeared to harbour a single 
copy loss of PTEN (Extended Data Fig. 3). Furthermore, we found that 
10 of the 14 metastatic lesions harboured additional genomic altera- 
tions within PTEN. The spectrum of PTEN alterations was heterogen- 
eous across the 10 samples and included a splice site mutation at K342, 
a frameshift indel at P339 (confirming the WES result), and 4 different 
exon-level deletions (Fig. 3a and Extended Data Fig. 4). All 10 speci- 
mens with either secondary PTEN mutations or copy number loss were 
confirmed negative for PTEN staining by IHC, whereas the four speci- 
mens that retained a PTEN wild-type allele were positive for PTEN 
protein expression (Extended Data Fig. 5). In addition, for those lesions 
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that were visualized by CT (computerized tomography) scan, there was 
a tight correlation between progression of disease and loss of PTEN 
expression. The peri-aortic lesion (M02) that was responding at the time 
of disease progression still contained one PTEN wild-type allele and 
protein expression. Conversely, the lung lesions (M04, M06, M09 and 
M11) and liver lesion (M12) with documented progression to therapy 
had bi-allelic PTEN alteration and lack of expression. 

In an effort to integrate the genomic data from our patient, we con- 
structed a dendrogram mapping the phylogenetic evolution of the dis- 
ease. Our findings suggest that all the lesions were derived from the 
PTEN wild-type primary tumour, and that there was a progressive and 
parallel loss of PTEN under BYL719 selective pressure (Fig. 3b). Of note, 
the two-month duration between progression to BYL719 and autopsy 
needs to be considered as well. 

In order to expand our observations, we analysed paired samples (pre- 
treatment and at progression) from six additional patients enrolled in 
the same study at our institution (Table 1). Targeted sequencing iden- 
tified homozygous loss of PTEN in a post-treatment sample of a breast 
cancer patient who developed resistance to BYL719 after initially experi- 
encing a durable response to therapy (Table 1). We also confirmed lack 
of PTEN expression by IHC in the post-treatment sample (Extended 
Data Fig. 6). We found no detectable PIK3CA mutations in the post- 
treatment samples of two patients (Table 1). Given that the presence of 
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PIK3CA mutations drove sensitivity to BYL719 in our cell line screens”, 
positive selection of clones bearing wild-type alleles of PIK3CA may 
explain the emergence of resistance to BYL719 in these two additional 
cases. These results may be an indication that in some cases PIK3CA 
mutations are not early founder/truncal events but branched subclonal 
drivers that are cleared from the tumours under the selective pressure 
of PI(3)Ko inhibition. In any case, the fact that loss of PTEN expression 
and emergence of PIK3CA wild-type clones are mutually exclusive in 
our patient samples indicates that both events may be important in 
opposing the therapeutic efficacy of BYL719. 

No other alterations with an obvious connection to BYL719 resist- 
ance were found in the responding cases, with the exception of a mu- 
tation (E1490*) and an in-frame deletion in MAP3K1 in one of the 
three patients for whom neither PTEN nor PIK3CA status changed 
during BYL719 treatment. Further characterization is needed to deter- 
mine whether these mutations lead to increased MEK and ERK signal- 
ling and limit the effects of PI(3)K inhibition. 

PTEN encodes for a phosphatase that regulates the activity of PI(3)K 
by limiting the accumulation of phosphatidylinositol-3,4,5-trisphosphate 
(PtdIns(3,4,5)P3 or PIP3), a required mediator to initiate the PI(3)K/ 
AKT/mTOR signalling cascade’®. In the absence of PTEN, cancer cells 
become dependent mostly on the activity of the p110B isoform of 
PI(3)K (PI(3)KB) to propagate signalling through downstream pathway 
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Figure 4 | Loss of PTEN expression and sensitivity to PI(3)Ka and PI(3)KB 
blockade. a, Western blot showing PTEN knockdown by two independent 
shRNAs and its effects on the PI(3)K/AKT/mTOR pathway. b, Cell viability 
assay in T47D cells with inducible PTEN knockdown (shPTEN no. 1 and no. 2) 
or PTEN expressing controls (shRenilla) treated with increasing concentrations 
of BYL719 or BKM120. Error bars, s.e.m. ¢, Antitumour activity of either 
BYL719 (25mgkg | daily) or BKM120 (25 mgkg ' daily) in PDXs 
subcutaneously grown in nude mice (n = 6 (vehicle) and n = 8 (treatments)). 
Error bars, s.e.m. d, Representative immunostaining for phosphorylated AKT 
(pAKT) and phosphorylated S6 (pS6) in PDXs treated as shown. Tumours were 
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effectors’’°. Therefore, we hypothesized that progressive decrease or 
loss of PTEN expression in the presence of PI(3)Ke inhibition might 
restore PI(3)K/AKT signalling through PI(3)Kf activity. To test our 
hypothesis, we established cell lines expressing either doxycycline- 
inducible or constitutive short hairpin (sh)RNA against PTEN messenger 
RNA using T47D and MCF7 cells, known to be intrinsically sensitive to 
BYL719”. As expected, induction of PTEN downregulation led to acti- 
vation of AKT and the downstream effectors PRAS40, GSK3B and S6 
in both T47D (Fig. 4a) and MCF7 (data not shown) cells under basal con- 
ditions. PTEN downregulation markedly limited the effects of BYL719, 
both at the signalling and cell viability level. On the other hand, PTEN 
knockdown did not result in resistance to the pan-PI(3)K inhibitor 
BKM120, which blocks all the PI(3)K p110 isoforms (Fig. 4b and Ex- 
tended Data Fig. 7a). Similar effects were observed in another BYL719- 
sensitive cell line (MDA-MB-453) with constitutive PTEN knockdown 
(Extended Data Fig. 7b). 

From our patient’s non-responding PTEN-null lung metastatic lesion, 
we were able to establish xenografts in nude mice. Consistent with the 
in vitro data, this patient-derived xenograft (PDX) was resistant to BYL719 
treatment but sensitive to BKM120 (Fig. 4c). The degree of inhibition of 
phospho-AKT and phospho-S6 was also higher with BKM120 (Fig. 4d 
and Extended Data Fig. 7c and d). These results were complemented 
by the combination of BYL719 and the PI(3)Kf inhibitor AZD6482. 
Upon PTEN knockdown, only combined PI(3)Ka and B blockade was 
capable of reverting the resistant phenotype (Fig. 4e and Extended Data 
Fig. 8a). Similarly, the BYL719-resistant PDX was insensitive to AZD6482 
alone but responded to the combination of both compounds (Fig. 4f). 
Profound inhibition of AKT and S6 phosphorylation was achieved only 
upon treatment with BYL719 in combination with AZD6482 (Fig. 4g 
and Extended Data Fig. 8b and c). Taken together, these data indicate 
that inhibition of the PI(3)KB isoform is required to achieve antitu- 
mour activity in cells/tumours that lost PTEN expression and become 
resistant to BYL719. 

We have reported a case of parallel genetic evolution under selective 
therapeutic pressure leading to a progressive loss of PTEN expression 
and consequent gain of dependency on the PI(3)Kf isoform. Parallel 
evolution under selective pressure has been described in conditions 
where treatments are highly efficacious, such as in HIV”. Our case 
highlights that this tumour, despite its heterogeneity, was dependent 
on PI(3)K signalling, probably as a result of the presence of the same 
activating PIK3CA mutation in all the tumour sites. Upon continued 
suppression of PI(3)Ka, diverse genomic alterations emerged, leading 
to PTEN loss as an alternative mechanism of PI(3)K activation. More- 
over, our study emphasizes the importance of tumour interrogation 
upon progression to therapy and the dynamic nature of tumour gen- 
omes under selective therapeutic pressure. 


Online Content Methods, along with any additional Extended Data display items 
and Source Data, are available in the online version of the paper; references unique 
to these sections appear only in the online paper. 
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METHODS 

PIK3CA mutant cell lines MCF7 (E545K) and T47D (H1047R) (ATCC) were 
transduced with the retroviral TRMPV vector. Doxycycline (Sigma) was used to 
temporally activate the expression of a microRNA 30-embedded shRNA targeting 
Renilla luciferase (control) or PTEN mRNA. The hairpin sequences used were as 
follows. 

Renilla luciferase. CTCGAGAAGGTATATTGCTGTTGACAGTGAGCGCAG 
GAATTATAATGCTTATCTATAGTGAAGCCACAGATGTATAGATAAGCA 
TTATAATTCCTATGCCTACTGCCTCGGAATTC 

PTEN no. 1. CTCGAGAAGGTATATTGCTGTTGACAGTGAGCGACCAGC 
TAAAGGTGAAGATATATAGTGAAGCCACAGATGTATATATCTTCACCT 
TTAGCTGGCTGCCTACTGCCTCGGAATTC 

PTEN no. 2. CTCGAGAAGGTATATTGCTGTTGACAGTGAGCGCCCAGAT 
GTTAGTGACAATGAATAGTGAAGCCACAGATGTATTCATTGTCACTAA 
CATCTGGTTGCCTACTGCCTCGGAATTC 

Cell viability was assessed using the tetrazolium-based MTT assay after 6 days of 

treatment. All cell lines resulted negative for mycoplasma contamination. Western 
blotting was carried out using previously described methods”. All the in vitro experi- 
ments were performed in triplicate. 
Patient-derived xenografts and IHC. Animals were maintained and treated in 
accordance with Institutional Guidelines of Memorial Sloan Kettering Cancer 
Center (Protocol number 12-10-019). Tumours were implanted subcutaneously in 
six-week-old female athymic NU/NU nude mice. Once the tumours reached a volume 
of ~200 mm’, it was expanded in multiple mice which were then randomized to 
the following treatments: BYL719, BKM120 (a pan-class I PI(3)K inhibitor), or 
AZD6482 (a PI(3)KB inhibitor), each administered orally at 25 mgkg? once a 
day. After treatment, mice were euthanized and tumours were harvested and pro- 
cured for IHC analysis. IHC was performed on a Ventana Discovery XT processor 
platform using standard protocols and the following antibodies: pAKT (S473) (D9E), 
Cell Signaling Technology, #4060, dilution 1:70. pS6 (S240/4) (D68F8)XP, Cell 
Signaling Technology, #5364, dilution 1:500. PTEN (138G6), Cell Signaling Tech- 
nology, #9559, dilution 1:30. 

All the in vivo experiments were run with at least n = 8 for each treatment arm. 

Two-way t-test was performed using GraphPad Prism (GraphPad Software). 
Error bars represent the s.e.m. *P < 0.05. 
Whole genome and whole exome sequencing. For whole genome (WG) and 
whole exome (WE) sequencing, DNA was derived from the primary tumour, lung 
metastasis, and peri-aortic lymph node metastasis. DNA from the spleen was used 
as a normal control. WG libraries were produced as previously described” and 
sequenced using the Illumina HiSeq 2500 platform as paired-end 100 base pair reads, 
producing ~30-fold (primary tumour, spleen normal)-50-fold (lung metastasis) 
coverage for WG sequencing. By hybrid capture (Nimblegen version 3.0) of the 
lymph node and lung metastases, primary tumour and spleen normal, we gener- 
ated ~100-fold coverage for WE sequencing. 

All patients provided written informed consent for the genetic research studies 
performed in accordance with protocols approved by Dana Farber/Harvard Cancer 
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Center Institutional Review Board. Autopsy in the index patient was performed 
within the first three hours post mortem. 

Targeted exome sequencing (IMPACT). DNA derived from the primary tumour, 
14 metastases, and matched normal spleen tissue was further subjected to deep- 
coverage targeted sequencing of key cancer-associated genes. Our assay, termed 
IMPACT (Integrated Mutation Profiling of Actionable Cancer Targets), involves 
hybridization of barcoded libraries to custom oligonucleotides (Nimblegen SeqCap) 
designed to capture all protein-coding exons and select introns of 279 commonly 
implicated oncogenes, tumour suppressor genes, and members of pathways deemed 
actionable by targeted therapies'*. The captured pool was subsequently sequenced 
on an Illumina HiSeq 2500 as paired-end 75-base pair reads, producing 513-fold 
coverage per tumour. Sequence data were analysed to identify three classes of somatic 
alterations: single-nucleotide variants, small insertions/deletions (indels), and copy 
number alterations. 

Barcoded sequence libraries were prepared using 250 ng genomic DNA (Kapa 
Biosystems) and combined in a single equimolar pool. Sequence data were demul- 
tiplexed using CASAVA, and reads were aligned to the reference human geno- 
me(hg19) using BWA and postprocessed using the Genome Analysis Toolkit (GATK) 
according to GATK best practices**”’. 

MuTect and GATK were used to call single-nucleotide variants and small indels, 
respectively”®. Exon-level copy number gains and losses were inferred from the 
ratio in Tumour:Normal sequence coverage for each target region, following a loss- 
normalization to adjust for the dependency of coverage on GC content”. 
Statistical analysis. Two-way t-tests were performed using GraphPad Prism 
(GraphPad Software). Error bars represent the s.e.m., P values are indicated as 
*P<0.05. All cellular experiments were repeated at least three times. All the 
in vivo experiments were run with at least 6-8 tumours for each treatment arm. 
Sample size was chosen to detect a difference in means of 20% with a power of 90%. 
Animals were randomized in groups with similar average in tumour size. Investigators 
were blinded when assessing the outcome of the in vivo experiments. 

For the cell viability graphs, nonlinear regression was applied to the experimental 
data sets. Curves were compared using the extra-sum-of-squares F test using 
a = 0.05. Hypothesis was rejected when nonlinear models were not nested within 
each other and was considered statistically significant. 
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Extended Data Figure 1 | CT scan of index patient. CT scan showing a liver lesion (baseline) experiencing a partial response after 8 cycles (cycle 8) of BYL719. 
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Extended Data Figure 2 | Gene copy number variation in both primary tumour and lung metastasis. 
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Extended Data Figure 3 | Representative exon-level copy number profiles for genes on chromosome 10 in all 14 metastases collected from the index patient. 


Exons in PTEN are shown in red. 
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Extended Data Figure 4 | Loss-of-function mutations in PTEN detected by IMPACT in metastases M06 and M10. Mutations were visualized by the 
Integrative Genomics Viewer (IGV). 
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Extended Data Figure 5 | PTEN immunostaining of the 14 metastases index patient. PTEN staining in PTEN negative samples is only present in 


collected during the autopsy. Haematoxylin and eosin (H&E) and PTEN stromal cells. 
expression detected by IHC in 14 metastases collected during the autopsy of the 
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Extended Data Figure 6 | PTEN immunostaining in patients treated with —_ therapy (baseline) and at time of disease progression (post-treatment) are 
BYL719. PTEN expression detected by IHC in paired samples from six compared. 
additional patients treated with BYL719. Specimens before starting BYL719 


©2015 Macmillan Publishers Limited. All rights reserved 


LETTER 


a 


MCF7 MCF7 
120 120 
100 100 
- - 
= 80 = 80 
— ~~ 
> > 
£ 60 = 60 
2 = 
S 40 = 40 
4 shPTEN #1 4 shPTEN #1 
207 & shPTEN #2 20] <& shPTEN #2 
. = shRenilla @ shRenilla 
0 
SAB BAO i es 4 SFA BBWAOT 2B 4 
BYL719 (Log2 uM) BKM120 (Log2 uM) 
b MDA-MB-453 MDA-MB-453 
140 140 
120 120 
—~100 100 
S & 
> 2 2p” 
Fe} ie] 
oe 60 a 60 
S > 
= 40 40 
> shPTEN #1 + shPTEN #1 
207 <& shPTEN #2 207 = shPTEN #2 
@& shRenilla © shRenilla 
OS a: BAO tf 2 Ss: -4 #8 4-3 2-1 Oo 1 2 
BYL719 (Log2 uM) BKM120 (Log2 pM) 
C 0.0001 d 
0.20 1.0 a 
0.0001 
2 2 os 
© 0.15 o F 
— & Vehicle BYL719 BKM120 
= _Vehicle BYL/IS _BKMI20 | 
= 2 
% o10 $ 
g ® | aeeueesexs. -.« | pss (S240/4) 
o = 0.4 
0.05 ico} 
dp) —_—_——_—_—_—_—_—_—_—_—_—_—_————————— 
< a 02 Lung metastasis PDX 
0.00 0.0 


Vehicle BYL719 BKM120 


Vehicle BYL719 BKM120 


Extended Data Figure 7 | Inducible loss of PTEN and sensitivity to BYL719 
and BKM120. a. Cell viability assay in MCEF7 cells with inducible PTEN 
knockdown treated with increasing concentrations of either BYL719 or 

BKM 120. Error bars, s.e.m. b, Cell viability assay in MDA-MB-453 (MDA453) 
cells with constitutive PTEN knockdown treated with increasing 
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concentrations of either BYL719 or BKM120. Error bars, s.e.m. 

c, Quantification of pAKT (S473) and pS6 (S240/4) from Fig. 4d. Student’s 
t-test was used and P values are indicated. d, Western blot from the PDXs 
treated as indicated. 
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Extended Data Figure 8 | Constitutive loss of PTEN and sensitivity to s.e.m. b, Quantification of pAKT (S473) and pS6 (S240/4) from Fig. 4g. 
BYL719 and AZD6482. a, Cell viability assay in T47D cells with inducible Student’s t-test was used and P values are indicated. Error bars, s.e.m. 
PTEN knockdown (no. 2) treated with increasing concentrations of either c, Western blot from the PDXs treated as indicated. 


BYL719 or AZD6482 in the presence of doxycycline 1 ug ml’. Error bars, 
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Extended Data Table 1 | Samples analysed from the index patient 


Lesion Location Cellularity CT scan Response WGS WES IMPACT 

Primary Breast Unknown Y Y Y 
M01 Ovary 75% N N N ¥ 
M02 Periaortic lymph node 50% Y Y Y Y ¥ 
M03 Liver (posterior) 65% N N N Y 
M04 Left Lung (lower lobe) 75% Y N N N Y 
MO05 Thoracic spine 65% N N N Y 
M06 Right Lung (upper lobe) 55% Y N N N Y 
M07 Liver (dome) 70% N N N Y 
M08 Uterus 55% N N N y 
M09 Left Lung (upper lobe) 75% Y N N N 4 
M10 Carina (lymph node) 70% N N N Y 
M11 __ Right Lung (lower lobe) 65% ¥ N Y Y Y 
M12 Liver (left lower lobe) 50% Y N N N Y 
M13 Liver (left lobe) 70% N N N Y 
M15 Adrenal gland 40% N N N Y 


Summary of the lesions collected during the autopsy of the index patient, cellularity assessment, imaging, clinical outcome, and sequencing techniques used. N, no; Y, yes. 
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Apico-basal forces exerted by apoptotic cells drive 


epithelium folding 


Bruno Monier'**, Melanie Gettings**, Guillaume Gay’, Thomas Mangeat', Sonia Schott'?, Ana Guarner* & Magali Suzanne!” 


Epithelium folding is a basic morphogenetic event that is essential 
in transforming simple two-dimensional epithelial sheets into three- 
dimensional structures in both vertebrates and invertebrates’. Fold- 
ing has been shown to rely on apical constriction”. The resulting 
cell-shape changes depend either on adherens junction basal shift? 
or on a redistribution of myosin II°*’, which could be driven by 
mechanical signals*. Yet the initial cellular mechanisms that trigger 
and coordinate cell remodelling remain largely unknown. Here we 
unravel the active role of apoptotic cells in initiating morphogenesis, 
thus revealing a novel mechanism of epithelium folding. We show 
that, in a live developing tissue, apoptotic cells exert a transient pull- 
ing force upon the apical surface of the epithelium through a highly 
dynamic apico-basal myosin II cable. The apoptotic cells then induce 
a non-autonomous increase in tissue tension together with cortical 
myosin II apical stabilization in the surrounding tissue, eventually 
resulting in epithelium folding. Together our results, supported by 
a theoretical biophysical three-dimensional model, identify an apop- 
totic myosin-II-dependent signal as the initial signal leading to cell 
reorganization and tissue folding. This work further reveals that, far 
from being passively eliminated as generally assumed (for example, 
during digit individualization’), apoptotic cells actively influence their 
surroundings and trigger tissue remodelling through regulation of 
tissue tension. 

In different morphogenetic contexts, apoptosis has been shown to 
have an essential role in tissue folding; however, the cellular mechanisms 
involved remain mostly unknown’. To characterize apoptosis- 
dependent folding, we focused on Drosophila leg epithelium morpho- 
genesis, a process that has been shown to rely on local apoptosis" 
(Extended Data Fig. 1a, Supplementary Video 1). Interestingly, we dis- 
covered that leg disc folding follows a stereotypical sequence, with fold 
progression following the spreading of cell death, beginning in the most 
ventral part, then progressing laterally to end in the most dorsal region 
of the developing leg (Fig. 1a, b, Extended Data Fig. 1b). To unravel the 
link between apoptotic cells and fold formation, we first focused on ap- 
optotic cell behaviour using live imaging. In the leg epithelium, apop- 
tosis follows the classical morphological steps including cell shrinkage, 
membrane blebbing and fragmentation into apoptotic bodies (Extended 
Data Fig. 1c). Initially, apoptotic cells remain columnar and attached 
to their neighbours, as described previously’* (Extended Data Fig. 2a). 
Indeed, we noticed that adherens junction components (E-cadherin, 
a-catenin and B-catenin) accumulate below the apical surface of dying 
cells, forming an adhesion peak which coincides with local deforma- 
tion of the apical surface of the surrounding epithelial cells (Fig. 1c, 
Supplementary Video 2). These observations prompted us to hypoth- 
esize either the presence of an apico-basal pulling force generated by 
the dying cells or, alternatively, a pushing force generated by the dying 
cell’s neighbours. Therefore, we analysed myosin II dynamics. Inter- 
estingly, we detected an apico-basal acto-myosin cable-like structure 
(hereafter named ‘cable’) inside each dying cell (Extended Data Fig. 2b) 
that is formed just before the local deformation of the epithelium 


surface (Fig. 1d, Supplementary Video 3). This myosin II cable is at- 
tached to the junctional structure described above (Fig. le). Remark- 
ably, apical surface release coincides with myosin II cable and adhesion 
peak detachment from the apical surface as the dying cell fragments 
(Fig. 1c, d, Extended Data Fig. 2c, d). Furthermore, when apoptosis is 
inhibited, neither the myosin II cable nor the apical deformation are 
observed (Extended Data Fig. 2e), suggesting that the myosin II cable 
constitutes the cellular apoptotic machinery responsible for the tran- 
sient deformation of the epithelium. 

We then asked whether the apico-basal myosin II cable is a general 
characteristic of apoptotic epithelial cells. By analysing myosin II dis- 
tribution in different Drosophila epithelial tissues, we revealed that an 
apico-basal myosin II cable also forms in apoptotic cells in other epi- 
thelia (Extended Data Fig. 2h).We further asked if this general property 
of apoptotic cells to generate an apico-basal myosin II cable is respons- 
ible for the local apical deformation observed around apoptotic cells. 
To test this, we induced ectopic apoptosis in the Drosophila wing which 
can be regarded as a naive tissue (as apoptosis normally occurs sporad- 
ically) and blocked myosin II function specifically in dying cells. While 
an apico-basal myosin II cable is formed in ectopic dying cells (Extended 
Data Fig. 2f), the local apical deformation around apoptotic cells is no 
longer visible when myosin II is inhibited (compare Fig. 1f to Fig. 1g, 
Extended Data Fig. 2g), indicating that the deformation strictly results 
from the myosin II dependent apoptotic force and is not generated by 
neighbouring cells. Together, these experiments demonstrate a fun- 
damental intrinsic in vivo property of yet non-fragmented apoptotic 
cells, namely their ability to produce a myosin II dependent apico-basal 
pulling force capable of transiently deforming adjacent cells (Fig. 1h). 

To determine how apoptotic cells control the reorganization of the 
remaining tissue, we characterized myosin II distribution and cell shape 
changes in the vicinity of apoptotic cells. Myosin II in this tissue is 
localized at the level of adherens junctions (Extended Data Fig. 3a, b). 
Interestingly, we observed that myosin II levels are increased along the 
apical membrane of apoptotic cell neighbours (Fig. 2a). Eventually, my- 
osin II and F-actin apical stabilization is found throughout the whole 
fold domain where apoptosis takes place compared to the segment do- 
main (Fig. 2b, Extended Data Fig. 3c, d) and is lost in absence of apo- 
ptosis (Fig. 2c and Extended Data Fig. 3e-g). Moreover, we observed 
that cells neighbouring apoptotic cells become progressively elongated 
and reduce their apical surface (Fig. 2d, Supplementary Video 4). Elon- 
gation then propagates from cell to cell, gradually spreading to the whole 
fold domain, thus generating a ring of stretched cells in which apoptosis 
occurs specifically (Extended Data Fig. 4a, b, Supplementary Video 5). 
These observations suggest that cell death is responsible for cell shape 
modification in the whole fold domain. We therefore compared cell shape 
dynamics of developing legs with or without apoptosis. In the control 
fold domain, cells elongate, decrease their apical surface and adopt a 
preferential orientation along the future fold (Fig. 2e). However, all 
these characteristics are lost when cell death is inhibited, demonstrating 
the essential role of apoptosis in determining cell morphology during 
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Figure 1 | Apoptotic cells exert a transient apico-basal force upon adjacent 
cells. a, Pupal leg disc from pre-fold stage (white pupae, WP) to late-fold stage 
(WP + 3h). Arrowhead colours indicate fold progression (n = 10,7, 10 and 10, 
respectively). b, Average number of dying cells in the fold domain (n = 10, 6, 
10 and 7, respectively). A, anterior; P, posterior. c, d, E-cadherin-green 
fluorescent protein (E-Cad::GFP) (c, n = 24) and MRLC::GFP (d, n = 21) 
dynamics in apoptotic cells (in red in c) from pre-fold stage leg discs. Time 

0 = collapse of apoptotic cell apical surface; the dotted line and arrowhead 


folding (Fig. 2e and Extended Data Fig. 4c—e). Using laser ablation, we 
tested whether cell elongation and myosin II stabilization at the level of 
adherens junctions reveal an increase in tissue tension in the fold do- 
main, as previously described in other tissues'*. We found that the 
release of tension between vertices was indeed much higher in the fold 
domain where apoptosis takes place than in the segment domain (Ex- 
tended Data Fig. 3h). In addition, when cell death was inhibited, tension 


Apoptotic cell shrinkage 


colour codes indicate equivalent stages of the apoptotic process. e, Co- 
localization of the apoptotic myosin II cable (green arrowhead) with adherens 
junctions (n = 19, white arrowhead) stained with anti-E-Cad. f, g, Ectopic 
apoptotic cells with (f) or without (g) myosin II activity generated in the wing 
disc. DN, dominant negative form. Red and open arrowheads point at presence 
or absence of apical deformation, respectively (see quantification and 
genotypes in Extended Data Fig. 2g). h, Schematics of apoptotic cell dynamics. 


in the fold domain was significantly lower than in the control situation 
(Extended Data Fig. 3i). Altogether these results demonstrate that, dur- 
ing leg folding, apoptosis induces a non-autonomous effect throughout 
the fold domain leading to acto-myosin apical stabilization, a global en- 
hancement of tissue tension and cell shape changes (Fig. 2f). 

To test the role of apoptotic forces in folding, we constructed a phys- 
ical model based on the two-dimensional vertex model”’. In this model, 
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Figure 3 | Biophysical model of epithelium folding. a-c, In silico models 
showing apoptotic pattern (insets), whole tissue (left) and cell shape (right, 
including normalized area) in the absence of apoptosis (a) and after removing 
apoptotic cells from the fold domain without (b) and with (c) an apico-basal 
force generated by each dying cell and transmitted apically to neighbours 
(noted ‘apoptotic forces’). 


three interactions are considered: cell elasticity dependent on the cell 
apical area, a contractility term dependent on the cell perimeter, and 
line tension dependent on apical junction length. To take into account 
all three dimensions of the leg epithelium, we have added an apico- 
basal tension to those interactions (Extended Data Fig. 5a), and have 
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considered cell elasticity as a function of cell volume rather than cell area 
while retaining the main characteristics of the original model (Extended 
Data Fig. 5b-e). Based on our observation of the developing leg disc, 
the leg tissue in our model is represented by a 50-cell circumferential 
cylinder, with the fold domain representing three rings of cells in which 
30 cells are programmed to die following the pattern of apoptosis in the 
leg (compare the cell death pattern in Fig. 1b and Extended Data Fig. 6a). 
In the absence of the apoptotic-dependent forces, the model indicates 
that the simple disappearance of 30 cells from a continuous ring-like 
domain is not sufficient to induce cell shape reorganization and to cre- 
ate an invagination at the tissue level (compare Fig. 3a and b). However, 
if a transient apico-basal force is applied in each dying cell, an invag- 
ination response is observed all around the cylindrical tissue, albeit 
irregular (Extended Data Fig. 8a). Now, if an increase of apical con- 
tractility is applied in two rows of apoptotic neighbours (representing 
the non-autonomous increase in tissue tension), a cell shape reorgan- 
ization is observed, although moderate (Extended Data Fig. 7a). Finally, 
if the transient pulling forces generated at a cellular scale are translated 
as increased contractility at the tissue scale we observe cell shape reor- 
ganization similar to that observed in the leg fold, along with regular 
and deeper folding (Fig. 3c). This shows that the added effect of both 
apoptotic forces is necessary and sufficient to induce folding in silico. 
Importantly, rising apoptotic cell number (Extended Data Fig. 6), apico- 
basal force strength (Extended Data Fig. 7) or the increase in apical 
contractility in apoptotic cells neighbours (Extended Data Fig. 8) leads 
to a gradation in cell and tissue shape changes and demonstrates the 
robustness of the model. 

Furthermore, our model predicts that an apico-basal pulling force 
generated by sporadic apoptosis is not sufficient to modify tissue shape 
(Extended Data Fig. 6c, d). Moreover, the synergy of forces arising from 
several apoptotic cells concentrated in a restricted region appears nec- 
essary to generate a force strong enough to produce a fold (compare 
Extended Data Fig. 6f with Extended Data Fig. 6g). To test this in vivo 
we induced apoptosis ectopically in the wing pouch (that is, a flat tis- 
sue) and observed that a high concentration of apoptotic cells in a re- 
stricted region is indeed sufficient to drastically modify the shape of 
the epithelium through the creation of an ectopic fold (Fig. 4a, b, left 
and middle panels, and Extended Data Fig. 9a—b’). Consistently, no 
folding was observed in regions where only sporadic apoptotic cells were 
generated (Extended Data Fig. 9b’’’). Importantly, this tissue bending 
coincides with an apical stabilization of myosin II (Fig. 4c, middle panel, 
Extended Data Fig. 9d-e’) and F-actin (Extended Data Fig. 9c’). It also 
strictly relies on apoptotic myosin II since the expression ofa dominant 
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Figure 4 | Ectopic fold formation. a, Schematics of wing discs depicting the 
genetic contexts analysed in b and c. b, c, Wing discs in the absence (left panels) 
or presence (middle and right panels) of ectopic apoptosis in the ptc domain 
(pink), with (middle panels) or without (right panels) myosin II activity in 
dying cells, showing wing disc morphogenesis (large panels in b) and dying cell 
extrusion (close ups in b, white arrowheads) and myosin II distribution (stained 
by an anti-Sqh/MRLC, ¢; in sagittal sections, red and black arrowheads point to 


the presence or absence, respectively, of myosin II apical stabilization). 
Close-up three-dimensional reconstructions are presented in c to visualize the 
different pools of apical myosin II and apoptotic cells; red and open arrowheads 
denote ‘fold domain apical myosin IT and ‘contractile ring of myosin IT’, 
respectively (see also Extended Data Fig. 9e-e’’). b, n = 8, 13 and 10; ¢, n = 6,7 
and 8 in left, middle and right panels, respectively. 
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negative form of myosin II specifically in dying cells at the onset of apo- 
ptosis induction suppresses folding (Fig. 4a, b, right panels, and Ex- 
tended Data Fig. 9a’’, b’’), whereas apoptotic cell extrusion from the 
epithelium remains normal (Fig. 4b, middle and right panels). Inter- 
estingly, the absence of fold in this context is concomitant with an ab- 
sence of myosin II apical stabilization in the whole domain of ectopic 
apoptosis (although the myosin II contractile ring involved in dying cell 
extrusion’® is still present, see right panel in Fig. 4c and Extended Data 
Fig. 9e’’), showing that this accumulation strictly depends on apopto- 
tic myosin II (compare Fig. 4c middle panel with right panel, quanti- 
fications are in Extended Data Fig. 9d). Altogether, these data strongly 
suggest that the apoptotic force resulting from the synergy of numerous 
and patterned apoptotic events constitutes the primary signal leading 
to epithelium folding. 

In this study, we elucidated a novel cellular mechanism of epithelial 
folding that relies on apoptosis. In Drosophila epithelia, cells are extre- 
mely columnar and apoptotic cells, in addition to the force generated by 
their extrusion from the epithelium as previously described in squam- 
ous epithelium”, create a myosin-II-dependent apico-basal intracellu- 
lar pulling force. Based on previous publications showing that transient 
mechanical forces are sufficient to induce myosin II recruitment at the 
apical surface of an epithelium*", we propose that apoptotic cells send 
a biomechanical signal to their neighbours, although we cannot exclude 
the myosin-II-dependent release of a molecular signal. The apoptotic 
signal triggers non-autonomous myosin II recruitment at the level of 
adherens junctions of neighbouring cells. The synergy of several apop- 
totic cells leads to a redistribution of myosin II, increased tension, apical 
constriction in the entire fold domain and subsequent tissue folding 
(Extended Data Fig. 10). Taken together, these results reveal that through 
the regulation of tissue tension, apoptotic cells can actively control tis- 
sue remodelling. 

Interestingly, apoptosis-dependent folding has also been described 
in vertebrates during neural tube bending. This bending is an import- 
ant step in neural tube closure and its failure can lead to spina bifida 
phenotypes'*”*. Therefore, it would be interesting to test if the cellular 
mechanisms described here are conserved in vertebrates, thus general- 
izing the mechanism of apoptosis-dependent epithelium folding. 


Online Content Methods, along with any additional Extended Data display items 
and Source Data, are available in the online version of the paper; references unique 
to these sections appear only in the online paper. 
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METHODS 

Fly stocks and genetics. The fluorescent reporters used are the following: E-Cad- 
KI(GFP)”’, ubi::E-Cad::GFP”, uas::alpha-catenin-TagRFP”', uas::SCAT3 (FRET 
reporter of caspases activity”), sqh[AX3];;sqh::sqhGFP[40] and sqh[AX3];sqh:: 
sqhGFP[42] (MRLC::GEP)”’, w;ap::Gal4,arm::arm-GFP;uas::mCD8-Cherry (gen- 
erated using Bloomington stocks), w,Dlg1-GFP (CC01936, from Flytrap), Dll::Gal4 
MD23, zfh2 LP30::Gal4 and UAS::Diap] are described in Flybase, UAS::p35 (inser- 
tions in chromosome II and III) come from Bloomington. 

Stocks for ectopic cell death induction are y,w,HS::flp;act>y+ >Gal4,uas::GFP 

and wsptc::Gal4, uas::GFP; tub::Gal80ts (gifts from C. Benassayag), y,w,HS::flp;; 
actSC>CD2> Galé4 (generated using Bloomington stocks), w;uas::lifeactGFP;uas:: 
reaper (gifts from X. Wang and from Flybase),w; uas::hid(4),uas::DN-zip::GFP 
(gifts from H. Steller and D. Kiehart), w; uas::DN-zip::GFP; uas::rpr. Briefly, the 
progeny of crosses of interest were grown on standard medium at 25 °C. Third in- 
star larvae were heat shocked for 15-20 min at 38 °C and transferred to 29 °C for 
5.5 h-6h before dissection. Following this treatment, a reproducible bias was ob- 
served, with clones essentially following the dorsal-ventral boundary in half of the 
wing pouch. When using ptc::Gal4 to induce cell death, crosses were performed at 
18 °C, the progeny was transferred at least 5.5 h to 30 °C before dissection. 
Immunostainings. Primary antibodies obtained from Developmental Studies 
Hybridoma Bank were: rat anti-E-Cad (DCAD2, 1:50), mouse anti-Arm (N2 7A1, 
1:5) and mouse anti-Dlg (4F3, 1:200). Rabbit anti-cleaved Caspase 3 (9661, 1:100) 
and anti-cleaved Dcp1 (9578, 1:200) were obtained from Cell Signaling Techno- 
logies and chicken anti-beta-Gal (GTX77365, 1:1,000) was obtained from GeneTex. 
Mouse anti-Sqh (MRLC) and Guinea pig anti-Sqh1P (1:1,000) were gifts from 
R. Ward. Staining of the actin cytoskeleton was achieved using phalloidin- 
Rhodamine (1:200, Invitrogen) or phalloidin-Alexa647 (1:100, Interchim). Second- 
ary antibodies coupled to Alexa-488 or-555 were obtained from Fisher Scientific 
and diluted 1:200 while secondary antibodies coupled to Cy5 were obtained from 
Jackson Laboratories and diluted 1:50. Briefly, for immunostainings, imaginal tis- 
sues were fixed using paraformaldehyde (PFA) 4% diluted in PBS 1X. Samples 
were washed and saturated using PBS 1X, Triton X-100, 0.3% BSA 1% (BBT). 
Primary antibodies were diluted in BBT and incubated overnight at 4 °C. Next, 
samples were washed and saturated in BBT, incubated with secondary antibodies 
(and phalloidin if required to stain F-actin) as indicated above, and subsequently 
washed with PBS 1X, Triton X-100, 0.3%. Samples were mounted in Vectashield 
containing DAPI (Vectors laboratories) and analysed under a Zeiss LSM710 laser 
scanning microscope. A similar protocol was followed for immunostainings on 
embryos except that fixation was performed for 5 min in heptane:formaldehyde 
37% (1:1). For E-cadherin and Myoll, embryos were devitellinised manually and 
stained immediately. Note that in order to preserve wing morphology in ectopic 
cell death experiments; dissections were performed in Schneider medium, followed 
by fixation with 4% PFA diluted in Schneider medium. 
Time-lapse imaging. Leg discs were dissected in Schneider’s insect medium (from 
Sigma Aldrich) supplemented with 2% FCS and 0.5% penicillin-streptomycin. Ec- 
dysone (from Sigma aldrich, 20-hydroxyecdysone H5142) was stored in a stock so- 
lution of 200 pg ml’ at —20 °C and added to meet a final concentration of 2 1g ml’. 
For in vivo imaging, leg discs were transferred on a coverslip in 15 l of the above 
medium complemented with methyl cellulose (from Sigma-Aldrich) at a final con- 
centration of 2.5% to obtain a more viscous medium. Spacers (Secure -Seal Imag- 
ing Spacers 0.12 depth from Sigma-Aldrich) were added between the coverslip and 
an air-permeable membrane (Lumox 25 from Sarstedt,) to avoid compression of 
the tissue, and halocarbon oil was added on the sides of the spacer to protect from 
dehydration. Note that forceps and scissors, as well as air-permeable membrane, 
were washed with ethanol before dissection. The membrane is rinsed with water 
and dried before use. 

Before imaging, dissected leg discs may be incubated for 30 min with acridine 
orange (final concentration 0.1 jg ml” ') to reveal dying cells or with the red fluo- 
rescent lipid-binding dye FM4-64 (final concentration 36 1M). FM4-64 was also 
added at a concentration of 18 1M in the imaging medium. 

Imaging was essentially performed under an inverted laser scanning LSM710 
confocal (Zeiss). Under our conditions, ex vivo leg development reproduces mor- 
phologic stages of pupal leg development characterized on fixed tissues (including 
leg evagination, local folding and the pattern of cell death), albeit at a lower speed 
(around 2 times). 

Importantly, we found that image stacks of 30-40 j1m have to be taken every 
3 min with optimal sectioning (0.438 um for a 40X objective with 1.3 aperture and 
the pinhole set to 1 AU) in order to spot apico-basal myosin II cables and adhesion 
peaks as they are very dynamic structures. 

Note that apoptotic cells were identified using the caspase FRET sensor (SCAT3”, 
see section ‘Post-imaging analysis’ for details), acridine orange” or FM4-64 that 
strongly labels apoptotic bodies. 
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Ex vivo culture and drug treatment. Q-VD-OPh is a broad spectrum caspase 
inhibitor which binds to the active site of activated proteases, more efficiently than 
the commonly used Z-VAD caspase inhibitor and is described as non-toxic even at 
extremely high concentrations”. Discs from white pupae (WP) were dissected in 
complemented Schneider medium as described above and incubated from pre-fold 
to mid-fold stage with either Q- VD-OPh (from R&D systems; final concentration 
500 uM) or DMSO (from Sigma Aldrich, final concentration 0.5% in Fig. 2e and 
0.06% in other experiments) in Fig. 2c (incubation of 6h) or Fig. 2e and Extended 
Data Fig. 4d (incubation of 4h). Q-VD-OPh was stored in a stock solution of 
10 mM in either DMSO 100% (Fig. 2e) or in Schneider + 1.25% DMSO (other 
experiments). Throughout this study, we focused on the t4-t5 fold, the only fold 
that is exclusively formed during the pupal stage and for which progression can be 
easily followed due to disc evagination. At WP stage, formation of the t4-t5 fold is 
not initiated yet, while other tarsal folds are partially formed. Hence, in this par- 
ticular context of apoptosis inhibition, we observe, as expected, a slight perturba- 
tion of the formation of t3-t4 and strongly perturb the t4-t5 fold. 

Post imaging analysis. SCAT3 FRET probe was previously described”. It is an 
indicator of caspase-3 activation using fluorescence resonance energy transfer (FRET) 
between an enhanced cyan fluorescent protein (the donor) and an enhanced yel- 
low fluorescent protein (the acceptor) separated by a caspase cleavage site. FRET 
images of live leg discs were acquired with a Zeiss LSM710 microscope. A 458 nm 
laser was used to excite the sample. Cyan fluorescent protein (CFP) and yellow 
fluorescent protein (YFP) emission signals were collected through channel I (470- 
510 nm) and channel II (525-600 nm), respectively. CFP and YFP images were 
acquired simultaneously. Sequential acquisition of CFP and YFP channels in alter- 
native orders were tested and gave the same result as simultaneous acquisition. CFP 
and YFP images were processed by ImageJ software. A background region of in- 
terest was subtracted from the original image. Gaussian smooth filter was then 
applied to both channels. The final ratio image (YFP/CFP signal) was generated in 
Image] program. 

Zen software (Zeiss) was used to generate three-dimensional reconstruction and 

sagittal views of tissues. Images were processed in Adobe Photoshop CSS or Image]. 
Automated image analysis and quantifications of cell shape changes. Matlab, 
DipImage, Cell profiler and ICY were used to make automatic segmentation for quan- 
tification of cell shape in Fig. 2e and Extended Data Fig. 4d). Three-dimensional 
median filter was performed from confocal images before segmentation. A Z pro- 
jection based on the maximum intensity from each Z stack was applied. An adap- 
tive threshold was made to the Z projection images to define the outline of cells 
based on the Otsu method. The local maximum of the “distance” function was used 
to find the centre of each cell to produce a resulting binary image. The distance 
transform is then used on the binary image to calculate the distance from cell mem- 
brane. The local maxima of the distance transform was calculated and were selected 
as seeds. For cells with high anisotropy the calculation of local maximum induced 
some errors and was manually corrected. The coordinate of each local maximum 
was used like seed to use a watershed algorithm function in Matlab and obtain a 
label matrix allowing the quantification of each cell object. The cells situated out of 
the domain of interest were rejected. To perform statistics we calculated the max- 
imum and the minimum vertical coordinates based on the coordinates of the 
centre of outlined cells. Then 12 domains were defined based on this coordinate. 
The calculation of anisotropy was defined as the ratio between the lengths of the 
principal axes of an ellipsis fitting the cell contour. 
Measurements of actomyosin levels. To quantify junctional acto-myosin (Extended 
Data Fig. 3c). For each leg disc, acto-myosin intensity was measured using Image] 
by drawing a circular region of interest of 1 pm” in 8 individual junctions in the 
fold domain (t4-t5) and in 8 individual junctions in the segment domain (t4). The 
measurements done in the fold were then normalized with the mean value of acto- 
myosin intensity in the segment domain independently in each leg disc (to insure 
the presence of an internal control). As a consequence, the values represent the 
differential accumulation of junctional acto-myosin in the fold compared to the 
segment domain. 

To quantify acto-myosin per surface unit (Extended Data Fig. 3d, e, g). For each 
leg disc, acto-myosin intensity was measured using Image] by drawing a segmen- 
ted line either in the fold (t4-t5) or in the segment (t4) domain. These values were 
divided by the length of the line and each measure in the fold was then normalized 
by the corresponding segment domain value (to insure the presence of an internal 
control). As a consequence, the values represent the differential accumulation per 
surface unit of acto-myosin in the fold compared to the segment. 

Statistics. To calculate P-values, we used the non-parametric Wilcoxon rank sum 
test (also called Mann and Whitney test) since samples do not follow a normal dis- 
tribution and do not have equal variances in Extended Data Fig. 3c-e, g, Extended 
Data Fig. 4d and Extended Data Fig. 9d. The null hypothesis is that the measures are 
samples from continuous distributions with equal medians. The test considers that 
the samples are independent and in Extended Data Fig. 4d, it takes into account 
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that the control and Q-VD-OPh can have different lengths. For cell shape char- 
acteristics (anisotropy, area and orientation), values are represented by box plot 
(the red line represent the median). For acto-myosin level quantifications, values 
are represented as mean values with standard error bars. 

Photo-ablation experiments. Laser-ablation experiments in prepupal leg imaginal 
discs were performed with a pulsed Q-switched microchip double-frequency Nd: 
YAGlaser (1. = 532 nm, 550 ps, 7 kHz, 3.5 J per pulse). The laser beam was focused 
through a high numerical aperture oil-immersion lens (X60 C-Apochromat NA 
1.4, Leica) to produce an experimental beam waist around 600 nm in the focal plane. 
Photo-disruption was produced in the focal plane due to the plasma generation in 
the middle of the adherens junctions after a total of 21 laser pulses of 3 kHz. A pair of 
Galvanometric mirrors were conjugated to the black focal plane of the microscope 
objective lens to stir the ablation beam and allow simultaneous ablation of two junc- 
tions. We used an inverted microscope (Leica DMI6000B) with wide field illu- 
mination to produce fluorescent live imaging of adherens junctions labelled with 
arm::GFP fusion protein. Adherens junctions were placed in the centre of the field 
to produce a better reproducibility. METAMORPH coupled to ILAS software con- 
trolled the laser and the microscope (RopperScientifiC SA). The Images were taken 
every second over a period of 30s. Note that for laser ablations, in absence of cell 
death (incubation with Q- VD-OPh, Extended Data Fig. 3i), expression of a mem- 
brane-bound cherry protein under the control of the apterous driver (w;ap::Gal4, 
arm::arm-GFP;uas::mCD8-Cherry; for expression pattern, see Extended Data Fig. 4c) 
was used to visualize the fold domain since cell shape changes do not occur in these 
conditions. 

Modelling. Our apical junction network model is a generalization to three- 
dimensional of the Farhadifar et al. model’®. In order to allow non-planar inter- 
actions, the surface elasticity is replaced by a volume elasticity. The apico-basal 
interaction is modelled as an energy proportional to cell height, with the same linear 
formas the apical line tension. The initial epithelium is represented as an hexagonal 
lattice of cells over a cylinder. Next, each cell follows a division process with a ran- 
dom division plane orientation. The order of cell divisions on the epithelium is 
chosen randomly, uniformly across the epithelium. Division is modelled by first 
increasing the cell equilibrium volume, finding the local energy minimum and 
second by dividing the cell and finding the new local energy minimum. Then, apo- 
ptotic cells are chosen randomly in a region around the centre of the epithelium, 
with a biased distribution reproducing the in vivo cell death pattern. Apoptosis is 
performed by gradually diminishing the cell preferred volume, increasing its con- 
tractility and rising the apico-basal tension term (ten steps are performed). At each 
step, the local energy is minimised by a gradient descent strategy. After the first 
step of the first apoptotic cell is performed, the second cell starts its apoptosis, and 
so on. Once ten steps are performed for one cell, it is removed from the tissue, such 
that no centripetal force remains at that point. When removing the apoptotic cell, 
its neighbours are rearranged through a series of type 1 intercalations (see Faradifar 
et al.'*). To model acto-myosin activity increase in the neighbouring cells, con- 
tractility is increased in the neighbouring cells by a factor the amplitude of which 
decreases exponentially with the distance to the apoptotic cells. After a cell has been 
eliminated and its neighbours rearranged, the whole tissue is brought back to equi- 
librium by successively computing the local energy minimum for all the cells of the 
tissue in a random order. Technically, the epithelium is described as an oriented 


graph in the graph-tool library’® (http://graph-tool.skewed.de) in the Python pro- 
gramming language. Energy minimisation is performed with the scipy library” 
(http://scipy.org), using the Broyden, Fletcher, Goldfarb, and Shanno bound con- 
strained minimisation algorithm provided by this library. The source code for the 
model is released under the GNU General Public Licence and is available on github 
at https://github.com/glyg/leg-joint and http://dx.doi.org/10.5281/zenodo.13386. 
Extensive details on the numerical method as well as a complete derivation of the 
gradient can be found at this address. 

The apical contractility and linear tension parameters I’ and A where chosen so 
that the relation between cell apical area and number of neighbours follows the 
same linear increase as observed by Farhadifar et al.’ in the wing disc (compare 
Fig. 2g of Farhadifar et al.'° with Extended Data Fig. 5b of the present work). We 
explored a range of apoptotic cell number, apico-basal force amplitude and con- 
tractility increase values (Extended Data Figs 6, 7, 8) and observed that cell mor- 
phology in the fold is best reproduced with amplitudes of 1 [and 1 A. The fact that 
they are in the same order of magnitude as the apical interactions gives us confid- 
ence that the chosen parameters are realistic. Furthermore, our tissue has a con- 
sistent morphology with or without apoptosis, increased apical contractility or 
apico-basal force, meaning that the balance of forces is coherent and equilibrated. 

For cell shape analysis, a sector centred 30° above the ventral most part of the 
cylindrical epithelium was extracted. Radius and area were normalized with respect 
to their average values on the simulated epithelium in the absence of apoptosis. 
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Extended Data Figure 1 | Spatio-temporal pattern of fold formation and 
hallmarks of apoptosis. This figure is associated with Fig. 1. a, Time-lapse 
images and schematics of the distal region of a Dlg::GFP leg disc from pre-fold 
stage (WP) showing the progression of the t4-t5 fold (white arrowheads). 
Acridine orange was used to stain dying cells (green) and FM4-64 to stain 
membranes (red). Note the presence of apoptotic cells in the fold region 
(coloured in grey in the schematics below, n = 27). The t4-t5 domain is 
indicated by a black line on the schematics. b, Three-dimensional 
reconstruction of the distal region of Dlg::GFP pupal legs undergoing fold 
morphogenesis. The colour code indicates tissue depth. Images show legs at 
different stages of development, from WP to WP + 3h. Throughout this 
study, we have focused on the t4-t5 fold, the only one that is exclusively formed 
during pupal stage and for which progression can be easily followed due to disc 
evagination. For each time point, top and bottom panels show dorsal and 


ventral views of leg discs respectively, and the t4-t5 domain is indicated by a 
white line. Note that the fold is initiated in the most ventral part of the leg, 
then progresses laterally (arrows) to end in the most dorsal part of the leg 
(n= 10 for WP, n=7 for WP + 1h, n= 10 for WP + 2h, n= 10 for 

WP + 3h). c, High magnification images from a time-lapse video during 
apoptosis showing caspase activity, revealed by the FRET construct SCAT3 
(top), and the outline of cell membranes, revealed by FM4-64 staining 
(bottom) (nm = 19). These images illustrate that the classical apoptotic stages, 
including shrinkage, blebbing (hollow arrowheads) and fragmentation (black 
arrowheads), are recapitulated in the developing Drosophila leg epithelium. 
Black outline (top) and red false-colour (bottom) highlight the apoptotic cell. 
Another apoptotic cell (outlined in white and coloured in pink) has also 

just turned on the apoptotic pathway. Note that in both cases the apoptotic 
pathway is turned on before visible morphological change. 
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Extended Data Figure 2 | Adherens junctions and acto-myosin cytoskeleton 
dynamics in apoptotic cells. This figure is associated with Fig. 1. a, Time lapse 
images of a pupal leg disc expressing «Cat-TagRFP in the ap domain 

(which encompasses the t4-t5 fold) and the FRET construct SCAT3 to reveal 
caspase activity and thus visualize apoptotic cells (n = 10). The apoptotic 

cell outline is visible on the sagittal section and represented on the scheme. 
The position of the sagittal section is indicated by a black line on the apical top 
view to the left. Note the reduction of the apical surface of the apoptotic cell 
(apical top views on the right). The apoptotic cell is highlighted in red on 
both panels. b, Acto-myosin cable (green arrowhead) observed in a single 
apoptotic cell from a pupal leg disc (n = 6). MRLC::GFP is in green, F-actin 
(stained with phalloidine) is blue and cleaved Dcp1 is red. ¢, Sagittal sections of 
an apoptotic cell (visualized with the FRET construct SCAT3) before and after 
fragmentation, illustrating the apoptotic adhesion peak (n = 27, red 
arrowhead) and its splitting up (hollow red arrowhead) when the apoptotic cell 
detaches at fragmentation. d, Time-lapse images of a MRLC::GFP pupal leg 
disc showing apical surface release upon apoptotic cell fragmentation (time- 
point 69 min, n = 12). Fragmentation is clearly visible in the z section (top) 
showing membrane staining of the apoptotic fragments with FM4-64 (white 
arrowheads) (the position of the z section is indicated by a black dotted line in 
the sagittal view). The myosin II cable is indicated by hollow open red 
arrowheads in sagittal sections. e, MRLC::GFP leg discs incubated in either 


DMSO (control) or Q-VD-OPH (cell death inhibition), showing the absence 
of the apico-basal myosin II cable in the absence of cell death (right, n = 0 out 
of 30). Arrowhead points out to the myosin II cable in the control (n = 14 
out of 30). f, A single apoptotic cell (same cell shown in Fig. 1f, identified by 
GFP expression, red) generated by ectopic expression of reaper in the wing 
disc (from y,w,hs::flp; act-frt-y+-frt-Gal4, uas::GFP / uas::lifeactGFP; uas::rpr 
larvae). The green arrowhead points out to the myosin II cable stained by anti- 
phospho-Sqh/MRLC (green, n = 11). The dotted red line outlines the dying 
cell. g, Percentage of individual apoptotic cells (with or without myosin II 
activity) with apical deformation. This quantification is associated with Fig. 1f, 
g (n = 68 and n = 106, respectively). Genotype for ectopic cell death is 
y,w,hs::flp; act-frt-stop-frt::Gal4, uas::GFP / uas::lifeactGFP; uas::rpr and 
genotype for ectopic cell death without myosin II activity specifically in 
apoptotic cells is y,w,hs::flp; uas::DN-zip::GFP, uas::hid / act-frt-stop-frt::Gal4. 
Note that, in the latter condition, myosin II activity is maintained in the 
neighbouring cells. h, Sagittal sections close-ups of MRLC::GFP wing and 
antennal discs from 3rd instar larvae and a MRLC::GFP stage 11 embryo 
showing that an apico-basal structure of myosin II (green arrowheads) is 
formed in dying cells in each of these different tissues (n = 6, 7 and 7, 
respectively). Cleaved Dcp1 (activated caspase) is in red, E-Cad in blue (wing) 
and Arm/BCat in blue. The region shown in each close up is indicated by a red 
line on schematics on the left. 
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Extended Data Figure 3 | Myosin II and tension dynamics during fold 
formation. This figure is associated with Fig. 2. a, Close-up views of cells 
expressing MRLC::GFP, showing that, in the leg disc, myosin II is preferentially 
accumulated at the cortex (green) rather than at the medio-apical web 
(magenta). Relative intensity of myosin II at both levels is shown on separate 
panels (n = 22). b, Sagittal section of a leg disc from pre-fold stage (WP) 
showing that myosin II (visualized by MRLC::GFP construct, in green) co- 
localizes with E-Cad at adherens junctions (labelled in red) (n = 22). 

c, Quantification of myosin II::GFP (left) and F-actin (right) levels at individual 
junctions in fold (t4-t5) and segment (t4) domains of leg discs from pre-fold 
(young WP + 1h) and late-fold (young WP + 4h) stages (n = 144 and 

n = 112, respectively). Values are represented as mean values with error bars 
representing standard errors. The intensity of the signal in the fold has been 
normalized with the mean intensity in the segment domain. We used the 
non-parametric Wilcoxon rank sum test (also called Mann and Whitney test). 
d, e, Quantification of myosin II::GFP (left) and F-actin (right) levels per surface 
unit in fold (t4-t5) and segment (t4) domains of (d) leg discs from pre-fold 
(young WP + 1h) and late-fold (young WP + 4h) stages (n = 23 and n = 18 
measurements respectively) or (e) in dissected discs cultured from pre-fold to 
mid-fold stage with either DMSO (control) or Q-VD-OPh (cell death 
inhibition) (n = 32 and n = 24 measurements respectively). Values are 
represented as mean values with error bars representing standard errors. The 
intensity of the signal in the fold has been normalized with the mean intensity in 
the segment domain. We used the non-parametric Wilcoxon rank sum 

test (also called Mann and Whitney test). f, Sagittal sections of pupal leg 
discs from mid-fold stage (WP + 2h) of the following genotypes: Dil- 
Gal4™231 (ctl, n = 8), UAS-DIAP1;LP30-Gal4 (Diap], n = 27), Dil- 
Gal4™?9] | UAS-p35 (p35, n = 11) and Dil-Gal4™?”*! / UAS-p35; UAS-p35 
(p35 x2, n = 15) (at early pupal stages, LP30-Gal4 and Dll::Gal4[MD23] show 


similar expression patterns, namely expression in the distal tibia and in all tarsal 
segments”’). The stabilization of myosin II and F-actin in the t4-t5 fold 
observed in the control (red arrowheads) is reduced or absent when cell death is 
inhibited (open arrowheads point out to the t4-t5 domain in the context of cell 
death inhibition). Myosin II is detected using anti-Sqh/MRLC antibody. The 
fold domain is false-coloured in pink and the segment domain in yellow. 

g, Quantification of myosin II (using anti-sqh antibody, left) and F-actin (right) 
levels per surface unit in fold (t4-t5) and segment (t4) domains in leg discs 
from mid-fold stage (WP + 2h) in control (DllGal4) and cell inhibition 
(DIl>p35x2) contexts (n = 25 and n = 28 measurements respectively). Values 
are represented as mean values with error bars representing standard errors. 
The intensity of the signal in the fold has been normalized with the 

mean intensity in the segment domain. We used the non-parametric Wilcoxon 
rank sum test (also called Mann and Whitney test). h, Laser ablation 
experiments of apical membranes in arm::GFP leg discs in the segment domain 
versus the fold domain where apoptosis takes place. Discs were dissected 

and cultured ex vivo from pre-fold stage (WP) to mid-fold stage. Note the 
increase in the length of vertex release in the fold domain compared to the 
segment domain (n = 4). i, Laser ablation experiments of apical membranes in 
the fold domain of arm::GFP leg discs incubated from pre-fold stage (WP) 

to mid-fold stage with either DMSO (control, n = 5) of Q-VD-OPH (cell death 
inhibition, n = 4). Note that vertex release in the fold is reduced in the absence 
of apoptosis. h, i, Right panels, graphs representing quantifications of the 
increase in distance between vertices following laser cut, revealing apoptosis- 
dependent increased cellular tension in the fold domain versus the segment 
domain. Examples of ablated cells before (green) and after (magenta) laser cut 
are shown on the left. Orange bars represent the region where the laser cut has 
been performed. Errors bars correspond to the standard error of the mean. 
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Extended Data Figure 4 | Cell shape dynamics during fold formation. This 
figure is associated with Fig. 2. a, Three-dimensional reconstructions and 
schematics of leg imaginal discs from pre-fold stage (WP) and mid-fold stage 
(WP + 2h) stained with E-Cad in white and cleaved Dcp1 (revealing caspase 
activity) in red (n = 11 for each). High magnifications of the fold domain 
(surrounded in red) are shown on the right hand side of each panel. 
Arrowheads indicate apoptotic cells. b, Cell shape dynamics during fold 
formation (n = 8). c, Schematic of a pupal leg disc showing the apterous 
domain (in red) in the t4 tarsal segment and overlapping the t4-t5 fold (top). 
The bottom panel shows a typical result of automated cell outline extraction 
of a pupal leg disc double-stained for adherens junctions and the apterous 
domain. Cells from this domain have been subdivided into 12 sections along the 
proximo-distal axis (see Methods). d, Anisotropy, area and orientation of cells 
from DMSO (control, n = 7) or Q-VD-OPH (cell death inhibition, n = 8) leg 
discs from fold domain sections 10 and 11 (see Extended Data Fig. 4c) were 


quantified and values represented as box plot. Discs were dissected and 
incubated from pre-fold stage (WP) to mid-fold stage. We used the non- 
parametric Wilcoxon rank sum test (also called Mann and Whitney test). 

e, Three-dimensional reconstruction images of anti E-Cad stained pupal leg 
discs from mid-fold stage (WP + 2h) (top left) of the following genotypes: 
LP30::Gal4 (control, ctl, n = 11), uas-DIAP1; LP30::Gal4 (Diap1, n = 22), 
Dll::Gal4[MD23] / uas-p35 (p35, n = 9) and Dil::Gal4[MD23] / uas-p35; 
uas-p35 (p35 x2, n = 8) (at early pupal stages, LP30-Gal4 and DIl::Gal4[MD23] 
show similar expression patterns, namely expression in the distal tibia and in 
all tarsal segments”’). For each condition, cell outlines were extracted and 
anisotropy, apical surface area and orientation of cells from the fold domain 
were quantified and colour-coded. Note that when cell death is inhibited, 
anisotropy is reduced, apical surfaces are increased and cell preferential 
orientation is perturbed compared to the control situation. 
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Extended Data Figure 5 | Three-dimensional epithelial cell model. This 
figure is associated with Fig. 3. a, Each cell is represented as an apical surface 
delimited by apical junctions. Each cell interacts with its neighbours through 
the apical junctions at its borders. In the original work by Farhadifar et al.’°, 
three interactions are considered: (1) the tension opposing the elongation 

of a particular junction edge, with energy increasing with edge length; (2) a 
contractility, with energy proportional to the cell perimeter squared, used to 
model cell constriction; (3) a surface elasticity bringing the apical cell area 
back to a preferred area. As in the original work, the model can produce cell 
division, types one and three transitions, to which we added apoptosis. Yet in 
our case, contrary to Jiillicher and colleagues work'*, we must also take into 
account non planar modifications of the epithelial sheet. To this end, we 
modified the elastic area interaction to take into account a constrain on cell 
volume. The new interaction is termed volume elasticity and transmits 
contractions and dilations of the apical sheet along the apical-basal axis. The 
associated energy is proportional to the square of the difference between the 
current cell volume and a preferred volume. b, Average areas of cells as a 
function of the number of neighbouring cells in the epithelium before 
apoptosis, to be compared with Fig. 2g in Farhadifar et al.’°. Our tissue shows a 
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similar trend of growing area with the number of sides, in good quantitative 
agreement with Farhadifar et al.'°. c, Distribution of the number of 
neighbouring cells (or, equivalently, of the number of cell sides), to be 
compared with Fig. 2f of Farhadifar et al.’°; once again, we are in good 
quantitative agreement with their model. d, Ground state diagram of the 
vertex model, comparing two-dimensional and three-dimensional hexagonal 
network boundaries (we restricted ourselves to A > 0 and I > 0 regions). The 
black dot indicates the chosen values for the line tension and contractility 
parameters, which are the same as case I in the Farhadifar et al.’ article. 

e, Variation of the normalized energy of a regular epithelium comprised of 
identical hexagonal cells as a function of a scale factor 6. Plain lines, analytical 
calculus; dotted line, average cell energy for a cylindrical tissue of 32 cells in 
diameter per 29 cells long. A scaling of 6 = 1 means that the cells are at their 
equilibrium volume in the absence of elasticity and contractility, and thus 
corresponds to the minimum of the blue lines (volume elasticity). Green 
lines correspond to line tension and yellow lines to contractility. The 
discrepancy between theoretical and computed values is due to the effect of 
cells lying at the border of the cylindrical epithelium. 
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Extended Data Figure 6 | Effect of the apoptosis pattern on epithelium 
folding in silico. This figure is associated with Fig. 3. a, Representation of the 
in silico cell death pattern. Note the similarity with the in vivo distribution 

of apoptosis observed in biological samples (Fig. 1b). Nonetheless, the 
representation is not exactly comparable since cell death pattern is represented 
relative to a developmental stage in the biological samples, whereas in the 
model, the cell death pattern is represented relatively to the number of dead 
cells generated by the theoretical simulation since the time scale is not taken 
into account in the model. Left, representation of the in silico distribution 

of apoptotic cells around the fold domain. b-g, For each panel, from left to right 
are represented (1) a scheme of the fold domain showing the pattern of 
apoptosis, (2) a three-dimensional representation showing whole tissue shape 
and (3) for each condition, the corresponding cell outlines extracted from 
three-dimensional simulations in which anisotropy, area and orientation of 
cells from the fold domain are colour-coded. b-f, In silico models showing 
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whole tissue shape with an increasing number of apoptotic cells following the 
in vivo pattern of apoptosis. Note the gradual increase in anisotropy, gradual 
decrease of cell area and the preferential orientation with the gradual increase in 
the number of dying cells. The three-dimensional simulations in b and f are 
those presented in Fig. 3a and Fig. 3c, respectively. The three-dimensional 
simulation in f (framed in a blue rectangle in Extended Data Fig. 6, Extended 
Data Fig. 7 and Extended Data Fig. 8) corresponds to 30 apoptotic cells, with an 
apico-basal force of 1 A and an increase in apical contractility of 1T. g, In silico 
model for a random pattern of apoptosis, with all other parameters similar 

to f. h, The mean value of radius, anisotropy, area and orientation of cells from 
the whole fold domain, defined as a + 1 jtm region around the fold centre 

of simulations b-g are represented by box plot. The number of cells considered 
is n = 33, 38, 45, 61 and 98 for 0, 5, 10, 20 and 30 cells, respectively. This number 
varies due to changes in cells density in the domain. 


©2015 Macmillan Publishers Limited. All rights reserved 


LETTER 


a Apico-basal 
force 


OA 


e 

1.4 — ae - 1--F + ae 

w 12 6F ! 

=) ! 

=} 1.0) : ; > SP i 

3 0.8 — 2 4 t 

g 9 4 

= 0.6 2 3p Fee faded H 

£ aS & + T t 

& 0. 4 4 7 4 

E 0.4 2 a == == : 
0.2 : 1} G5 ths 
0.0-— : - =. 0 n i i L l 

@ 0.0 0.5 1.0 2.0 @ 0.0 0.5 1.0 2.0 


Apico-basal force (units of A) Apico-basal force (units of A) 


Extended Data Figure 7 | Effect of apico-basal force intensity on epithelium 
folding in silico. This figure is associated with Fig. 3. a-d, In silico models 
showing whole tissue shape following increasing values of apico-basal apoptotic 
force (30 apoptotic cells, apical contractility: 1°). For each panel, from left to 
right are represented (1) a scheme of the strength of the apico-basal force 
applied, (2) a three-dimensional representation showing whole tissue shape 
and (3) for each condition, the corresponding cell outlines extracted from 
three-dimensional simulations in which anisotropy, area and orientation of 
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cells from the fold domain are colour-coded. Note the gradual increase in 
anisotropy, gradual decrease of cell area and the preferential orientation with 
the gradual increase of the apico-basal force. e, The mean value of radius, 
anisotropy, area and orientation of cells from the whole fold domain of 
simulations a-d are represented by box plot. The © symbol corresponds to 
the condition in absence of apoptosis (from Extended Data Fig. 6b). 

The number of cells considered is n = 60, 69, 98 and 157 for A = 0, 0.5, 

1 and 2, respectively. 
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Extended Data Figure 8 | Effect of the gradual apical cell contractibility and orientation from the fold domain are colour-coded. Note the gradual 


increase on epithelium folding in silico. This figure is associated with Fig.3. _ increase in anisotropy, gradual decrease of cell area and the preferential 
a-d, In silico models showing whole tissue shape with an apico-basal force of __ orientation with the gradual increase of contractility. e, The mean value of 


1A in 30 apoptotic cells and increasing values of apical contractility in radius, anisotropy, area and orientation of cells from the whole fold domain 
neighbouring cells. For each panel, from left to right are represented (1) a of simulations a—d are represented by box plot. The @ symbol corresponds 
scheme representing the gradual increase of contractility values applied in to the condition in absence of apoptosis (from Extended Data Fig. 6b). 
apoptotic neighbours, (2) a three-dimensional representation showing whole The number of cells considered is n = 31, 61, 98 and 137 for T = 0, 0.5, 
tissue shape and (3) for each condition, the corresponding cell outlines 1 and 2, respectively. 


extracted from three-dimensional simulations in which cell anisotropy, area 
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Extended Data Figure 9 | Fold formation as a result of apoptotic forces. 
This figure is associated with Fig. 4. a-a’’, Schematics of wing discs depicting 
the pattern of apoptosis and the presence or absence of myosin II activity in 
apoptotic cells in each genetic context analysed in b-b’’’ and c-c’. Control 
(a, b, y,w,hs::flp; act-frt-y + -frt::Gal4, uas::GFP / uas::lifeactGFP; uas::rpr 
without clones), Ectopic cell death (a’, b’, b’’’, y,w,hs::flp; act-frt-y + -frt::Gal4, 
uas::GFP / uas::lifeactGFP; uas::rpr), and ectopic cell death without myosin II 
activity specifically in apoptotic cells (a’’, b’’, y,w,hs::flp; uas::DN-zip::GFP, 
uas::hid; act-frt-CD2-frt::Gal4). Note that, in the latter condition, myosin II 
activity is maintained in living cells. Wing discs were dissected from larvae 
heat shocked for 20 min at 38 °C. b-c’, For each panel, sagittal views and 
schematics of sagittal sections are shown (sagittal views correspond to the black 
dotted line indicated in a—-a’’). b-b’’’, A high concentration of myosin II 
positive apoptotic cells is sufficient to induce a fold in a naive tissue (red 
arrowhead, b’, n = 11) as shown by the visualization of the wing disc apical 
surface stained with an anti-B-catenin antibody (compare b’ with b). Note that 
no ectopic fold is observed when only a low number of apoptotic events 
occur (b’’’) or when myosin II is inhibited in apoptotic cells (open arrowhead, 
n= 5 out of 6) (b’’). c-c’, F-actin accumulates in ectopic folds (red arrowhead) 
when apoptotic myosin II is active in ectopic dying cells (compare c’ with 

c, n = 3). d, Quantification of myosin II levels in the patch (ptc) domain 
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(domain of cell death induction) in control (w; ptc::Gal4, uas::GFP; tub::G80/[ts] 
/ SM5-TM6B) ; ptc > rpr (w; ptc::Gal4, uas::GFP / uas::lifeactGFP; tub::G80/[ts] 
/ uas::rpr) and ptc > rpr+DN-Myoll (w; ptc::Gal4, uas::GFP / uas::DN- 
zip::GFP; tub::G80[ts] / uas::rpr) wing discs (control, n = 24; ptc > rpr, n = 28; 
ptc > rpr+DN-Myoll, n = 32). Values are represented as mean values with 
error bars representing standard errors. The intensity of the signal in the ptc 
domain has been normalized with the intensity in the anterior and posterior 
domains of the same disc (n.s. is for non-significant). We used the non- 
parametric Wilcoxon rank sum test (also called Mann and Whitney test). 
e-e’’, Wing disc close-ups (of wing discs shown in Fig. 4c) and schematics in 
the absence (e) or presence (e’ and e’’) of ectopic apoptosis in the ptc domain 
(red cells, false-coloured in red on the black and white images), with (e’) or 
without (e’’) myosin II activity in dying cells. e’, Note that we can distinguish 
two distinct pools of stabilized apical myosin II: “contractile ring myosin II” 
required for dying cell extrusion’* (blue arrows, purple in schematics) and “fold 
domain apical myosin II” stabilized in response to the apico-basal apoptotic 
force (red arrows, green in schematics). e’’, Note that, consistently with 
normal extrusion in this background, contractile ring myosin II is still present 
around apoptotic cells (blue arrows), whereas fold domain apical myosin II 

is absent. The star points at a dividing cell, further indicating that myosin II is 
still present in apoptotic cell neighbours. 
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Extended Data Figure 10 | Model of apoptosis-dependent epithelium folding. 
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RNA helicase DDX21 coordinates transcription and 


ribosomal RNA processing 


Eliezer Calo'*, Ryan A. Flynn**, Lance Martin’, Robert C. Spitale*, Howard Y. Chang? & Joanna Wysocka’? 


DEAD-box RNA helicases are vital for the regulation of various 
aspects of the RNA life cycle’, but the molecular underpinnings of 
their involvement, particularly in mammalian cells, remain poorly 
understood. Here we show that the DEAD-box RNA helicase DDX21 
can sense the transcriptional status of both RNA polymerase (Pol) I 
and II to control multiple steps of ribosome biogenesis in human 
cells. We demonstrate that DD X21 widely associates with Pol I- and 
Pol II-transcribed genes and with diverse species of RNA, most prom- 
inently with non-coding RNAs involved in the formation of ribo- 
nucleoprotein complexes, including ribosomal RNA, small nucleolar 
RNAs (snoRNAs) and 7SK RNA. Although broad, these molecular 
interactions, both at the chromatin and RNA level, exhibit remark- 
able specificity for the regulation of ribosomal genes. In the nucleolus, 
DDX21 occupies the transcribed rDNA locus, directly contacts both 
rRNA and snoRNAs, and promotes rRNA transcription, processing 
and modification. In the nucleoplasm, DD X21 binds 7SK RNA and, 
as a component of the 7SK small nuclear ribonucleoprotein (snRNP) 
complex, is recruited to the promoters of Pol II-transcribed genes 
encoding ribosomal proteins and snoRNAs. Promoter-bound DDX21 
facilitates the release of the positive transcription elongation factor b 
(P-TEFb) from the 7SK snRNP in a manner that is dependent on its 
helicase activity, thereby promoting transcription of its target genes. 
Our results uncover the multifaceted role of DD X21 in multiple steps 
of ribosome biogenesis, and provide evidence implicating a mam- 
malian RNA helicase in RNA modification and Pol II elongation 
control. 

RNA helicases are highly conserved enzymes that use the energy of ATP 
to remodel RNA secondary structures and ribonucleoprotein complexes** 
during various steps of RNA metabolism. In particular, the nucleolar 
helicase DDX21 is required for pre-rRNA processing*”, but the specific 
mechanism underlying this requirement remains unknown. Notably, 
DDX21 also influences c-Jun® transcriptional activities, suggesting a 
potential role in gene expression. To explore this, we first interrogated 
the chromatin association of DDX21 in HEK293 cells by chromatin 
immunoprecipitation followed by high-throughput DNA sequencing 
(ChIP-seq). Given that pre-rRNA processing occurs coordinately with 
rDNA transcription, we examined binding of DDX21 to the rDNA locus 
(Fig. la). DDX21 broadly, but specifically, associated with the transcribed 
region of the rDNA, but not with the intergenic spacer, a profile char- 
acteristic of known Pol I-associated co-transcriptional regulators’*. In 
addition to rDNA binding, we identified 4,420 high-confidence peaks, 
most residing within 5 kilobases (kb) from annotated Pol II transcrip- 
tional start sites (Fig. 1b). DDX21-bound promoters had, on average, high 
enrichment of Pol II and active chromatin marks (histone H3 Lys 4 trime- 
thylation (H3K4me3), H3K27 acetylation (H3K27ac) and H3K9ac), but 
were depleted for repressive (H3K27me3 and H3K9me3) and promoter- 
distal (H3K4mel1) marks (Fig. 1c, d). Analysis of transcription factor 
motifs enriched at DDX21-bound regions uncovered recognition motifs 
of factors implicated in cell growth and proliferation (for example, E2F, 
STAT1, NRF1 and ETS; Extended Data Fig. 1a). ChIP-seq results were 


verified by ChIP-qPCR (quantitative PCR) in two additional human cell 
lines, with all interrogated target regions showing enrichment by qPCR 
(Extended Data Fig. 1b and data not shown), indicating that the chro- 
matin interactions of DDX21 are reproducible across multiple cell types. 

Gene Ontology analyses of DDX21-bound regions revealed specific 
and highly significant association with several regulatory arms of the 
ribosomal pathway (Fig. le). To verify this further, we compared anno- 
tations of DDX21-bound promoters to those H3K4me3-enriched but 
DDX21-unbound (Extended Data Fig. 1c). As expected, DDX21-bound 
promoters were enriched for ribosomal Gene Ontology terms, while 
DDX21-unbound promoters were enriched for other biological pro- 
cesses (Extended Data Fig. 1d). DDX21 binding was evident at promot- 
ers of genes encoding components of both the 40S (for example, RPS3) 
and 60S (RPL23A and RPL8) subunits (Fig. 1f). Messenger RNAs that 
encode ribosomal proteins often harbour snoRNAs in their introns’. 
DDX21 binds promoters of more than 80% of snoRNA-containing host 
genes; those unbound are poorly expressed in HEK293 cells (Fig. lgand 
Extended Data Fig. le-g). 

To examine the effect of DDX21 on transcription, we depleted the 
protein using two independent short interfering RNA (siRNA) pools 
(Fig. 1h and Extended Data Fig. 2a). DDX21 knockdown decreased the 
steady-state levels of transcripts originating from DDX21-bound pro- 
moters, but had minimal effect on the unbound gene transcripts (Fig. 1h 
and Extended Data Fig. 2b). To explore whether DDX21 directly regulates 
transcription of ribosomal mRNAs, we measured the effect of DDX21 
depletion on the synthesis of nascent transcripts upon release from 
the transcriptional elongation block induced by the kinase inhibitor 
flavopiridol'®"’. We transfected HEK293 cells with control or DDX21 
3’ untranslated region (UTR) siRNAs, followed by expression of siRNA- 
resistant wild-type (DDX21™) or ATPase-defective!2 DDX21 (herein 
DDxX21°*'; Fig. li and Extended Data Fig. 2c). DDX21 knockdown 
impaired the production of nascent transcripts originating from DDX21- 
bound promoters, and this effect was rescued by the introduction of 
DDx21™*, but not DDX21*"" (Fig. 1j and Extended Data Fig. 2d). Sim- 
ilar results were obtained on transcripts originating from the rDNA locus 
(Extended Data Fig. 2e). By contrast, non-target genes were minimally 
affected by the loss or ectopic expression of DDX21 (Fig. 1j and Extended 
Data Fig. 2d). Thus, DDX21 associates with and positively regulates tran- 
scription of Pol I- and Pol II-dependent ribosomal genes in a helicase- 
dependent manner. 

The aforementioned observations prompted us to investigate poten- 
tial crosstalk between DDX21 functions across nuclear compartments. 
Consistent with previous studies'’, inhibition of Pol I with either CX- 
5461 or alow dose of actinomycin-D (ref. 14) displaced DDX21 from the 
nucleolus, whereas localization of the nucleolar protein fibrillarin was 
not affected under these conditions (Fig. 2a and Extended Data Fig. 3a, b). 
Notably, hour-long inhibition of Pol II with flavopiridol recapitulated 
the nucleolar exclusion of DDX21 (Fig. 2a). By contrast, serum star- 
vation or treatment with metabolic inhibitors impacting either cellular 
respiration or the mTOR pathway did not alter DDX21 localization 


1Department of Chemical and Systems Biology, Stanford University School of Medicine, Stanford, California 94305, USA. “Howard Hughes Medical Institute and Program in Epithelial Biology, Stanford 
University School of Medicine, Stanford, California 94305, USA. 3Department of Developmental Biology, Stanford University School of Medicine, Stanford, California 94305, USA. 
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Figure 1 | DDX21 associates with actively transcribed ribosomal genes. 

a, DDX21 ChIP-seq reads were mapped to a custom annotation file of the 
human rDNA locus and compared to input reads. IGS, intergenic spacer. 

b, Distribution of DDX21 ChIP-seq peaks over known genomic features. TSS, 
transcription start site; TTS, transcription termination site. c, d, Average 
ChIP-seq signal profiles from publically available data sets (see Methods) were 
generated for Pol II (c) and the indicated histone modifications around the 
centre of DDX21-bound regions (d). bp, base pairs. e, Genomic regions 
enrichment of annotations tool (GREAT) analyses of DDX21-bound regions. 
The x axis corresponds to the negative binomial P values. GO, Gene Ontology. 
f, University of California Santa Cruz (UCSC) genome browser tracks of 
DDX21 ChIP-seq at ribosomal genes containing intronic snoRNAs. 


(Extended Data Fig. 4), underscoring the preferential sensitivity of DDX21 
to transcriptional inhibition. Furthermore, inhibition of either Pol I or 
II impaired the association of DDX21 with both the rDNA and Pol II- 
regulated promoters (Fig. 2c and Extended Data Fig. 3c, d). This change 
in chromatin association was not due to widespread chromatin silenc- 
ing and compaction, as inhibitors did not affect CTCF binding at the 
rDNA- or Pol I]-regulated chromatin (Extended Data Fig. 3e, f). There- 
fore, the chromatin association of DDX21 relies on the transcriptional 
status of either Pol I or II, suggesting coordination of the functions of 
DDX21 across subnuclear compartments. 

The roles of DDX21 in transcription and rRNA processing are depen- 
dent on its intact helicase domain*®. We proposed that defining the 
RNA interactome of DDX21 would reveal insights into the molecu- 
lar mechanisms underlying its diverse functions. To identify DDX21- 
associated RNAs systematically, we performed tandem purification iCLIP 
(individual-nucleotide-resolution crosslinking and immunoprecipita- 
tion)’* (Fig. 3a and Extended Data Fig. 5a, b) in HEK293 cells induced 
to express Flag- and haemagglutinin-tagged DDX21 (Flag-HA-DDX21). 
DDX21 interacts with a diverse set of RNAs, of which rRNA and snoRNAs 
were most highly represented, while mRNAs contributed only 1.1% of 
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g, Quantitative assessment of DDX21 binding to the promoters of snoRNA 
host genes. h, qRT-PCR analysis of a representative panel of DDX21-bound 
genes upon DDX21 knockdown. Bars represent the average of three 
independent experiments. For DDX21-target genes, expression difference 
P<0.05 (Student’s t-test). i, (RT-PCR analysis of DDX21 levels after 
transfecting control or DDX21 siRNA and/or subsequent overexpression of 
siRNA-resistant DDX21“" or DDX21°*". j, RT-PCR analysis assessing 
the synthesis of newly made, unspliced transcripts upon DDX21 knockdown 
and/or reconstitution with siRNA-resistant DDX21“? or DDX21°*7 (see 
Methods). Data are mean and s.d. of three independent biological replicates. 
DMSO, dimethylsulphoxide; FL, flavopiridol. 


the iCLIP reads (Fig. 3b). Gene ontology term and KEGG pathway anal- 
ysis linked these mRNAs to ribosome function (Fig. 3b). Comparisons 
between DDX21 iCLIP targets and those of the splicing factor hnRNP- 
C** revealed little overlap, underscoring the specificity of our results 
(Extended Data Fig. 5c-e). We further confirmed select iCLIP interac- 
tions by ultraviolet RNA immunoprecipitation and quantitative reverse 
transcription PCR (qRT-PCR) (Extended Data Fig. 5f). 

rRNA and snoRNAs represent candidate direct partners for DDX21- 
mediated rRNA processing function. DDX21 broadly crosslinks to rRNA, 
with the strongest binding overlapping 2'-O-methylation (2’-Ome) and 
pseudouridylation (‘P) sites (Fig. 3c), which are targeted and modified 
by snoRNP complexes’’. Furthermore, DDX21 robustly interacts with 
regions in the 5’ external transcribed spacer, which is bound and pro- 
cessed by the U3 snoRNA"*. Consistently, U3 is the most enriched short 
repetitive RNA of DDX21, crosslinking to DDX21 in two distinct 5’ 
and 3’ regions of known rRNA targeting function'® (Extended Data 
Figs 6f and 8c). iCLIP of DDX21°" revealed that it retained the ability 
to bind the major classes of RNA recovered with the wild-type enzyme 
(Extended Data Fig. 6a—d). However, there was a marked restriction of 
DDX21°“" to 18S rRNA (Extended Data Fig. 6e). Notably, the inability 
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Figure 2 | DDX21 chromatin association is sensitive to Pol I and Pol II 
transcriptional status. a, Representative immunofluorescence images of 
methanol-fixed HEK293 cells after 1-h incubation with DMSO, actinomycin-D 
(50 ng ml ') or flavopiridol (1 11M). Scale bars, 10 um. DAPI, 4’,6-diamidino- 
2-phenylindole. b, c, ChIP-qPCR analysis in HEK293 cells sampling DDX21 
genomic occupancy at the rDNA locus and at a representative panel of 

Pol II-regulated, DDX21-target promoters, upon treatment with DMSO, 
actinomycin-D or flavopiridol. Data are mean and s.d. of three independent 
experiments. Neg, negative controls. 


of DDX21*“' to bind the 5’ external transcribed spacer was accompanied 
by the loss of the 5’ end binding to U3 (Extended Data Fig. 6f). Impor- 
tantly, marked loss of DDxX21°“" association with the rRNA occurred 
in the absence of transcriptional defects in snoRNA production (Extended 
Data Fig. 6g). 

Consistent with preferential association of DDX21" at rRNA modi- 
fication sites, smoRNAs represent the principal class of DDX21-bound 
RNAs (Fig. 3d), as exemplified by snorD66 and snorA67, representing 
the C/D- and H/ACA-box subfamilies, respectively. DDX21 crosslinked 
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failed to rescue rRNA methylation levels at multiple sites, especially on 
the 28S rRNA, whereas DDX21 recovered the 2’-Ome defects (Fig. 3f). 
We cannot exclude the possibility that the observed 2'-Ome defects were 
due to the role of DDX21 in snoRNA transcription. However, changes 
in snoRNA binding pattern between DDX21°*" and DDX21™" can 
be readily detected in the absence of snoRNA transcriptional defects 
(Extended Data Figs 6f, g and 7f). Therefore, the helicase domain of 
DDX21 is required for proper binding to both rRNA and snoRNAs, 
suggesting a direct function within the snoRNP. 

We next examined our iCLIP results for insights into the role of 
DDX21 in regulating Pol II-dependent transcription. While ultraviolet 
crosslinking to mRNAs suggested a potential cis-mechanism at DDX21- 
target genes, transcripts from most DDX21-occupied promoters were 
not recovered in iCLIP and only 8% of the iCLIP mRNA reads mapped 
within 5’ UTRs (Extended Data Fig. 8a, b). Thus, nascent RNA tether- 
ing is unlikely to be the major mechanism of DDX21 chromatin recruit- 
ment. Notably, 7SK snRNA, a well-known trans-acting non-coding RNA 
involved in Pol II transcription’ °°, was among the most highly enriched 
DDX21-bound RNAs (Extended Data Fig. 8c). Together with HEXIM1/2 
and P-TEFb (which consists ofa CDK9 and cyclin T1 heterodimer), 7SK 
functions to modulate Pol II promoter pause-release”””*. We observed 
DDX21 crosslinking most robustly to two specific sites on 7SK (Fig. 4a), 
outside the known binding sites of HEXIM1/2 and P-TEFb” (Extended 
Data Fig. 8d). Recent evidence demonstrated that the 7SK snRNP is phys- 
ically associated with Pol II promoters, where P-TEFb is then released***>*. 
We proposed that DDX21 is recruited to Pol II promoters together with 
the 7SK snRNP. Consistent with this hypothesis, protein components 
of the 7SK snRNP associate with DDX21 in reciprocal co-immunopre- 
cipitation experiments (Extended Data Fig. 8e, f). Furthermore, both 
HEXIM1 and CDK9 are bound at DDX21-target Pol II promoters, but 
not at the rDNA locus (Extended Data Fig. 9a, b). Finally, depletion of 
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7SK with two independent antisense oligonucleotides” strongly reduced 
the association of DDX21 with Pol I-bound promoters, but not at the 
rDNA (Fig. 4b and Extended Data Fig. 9c), indicating that 7SK facil- 
itates the association of DDX21 with promoters. Notably, similar to 
DDX21, Gene Ontology analyses of CDK9 ChIP-seq from HEK293 cells”” 
revealed enrichment for components of the ribosomal pathway (Extended 
Data Fig. 9d), suggesting that the 7SK snRNP complex might have an 
inherent preference for binding at ribosomal and growth control genes. 
Nonetheless, this preference alone cannot entirely explain the specifi- 
city of DDX21 targeting to ribosomal genes, because we also detected 
HEXIM1 and CDK9 at active promoters unbound by DDX21 (Extended 
Data Fig. 9a, b). 

Transcriptional elongation is triggered by phosphorylation of the Pol II 
carboxy-terminal domain at serine 2 (Ser2p) upon release of P-TEFb 
from 7SK****°, To test whether DDX21 facilitates P-TEFb release, we 
performed a ‘release assay*” by purifying the inactive form of the 7SK 
snRNP. Incubation of the inactive 7SK snRNP with purified Flag-HA- 
DDX21 resulted in the dose-dependent release of CDK9, whereas a con- 
trol Flag purification showed no release (Fig. 4c, d). Consistent with its 
role in promoting the activity of P-TEFb, DDX21 knockdown impaired 
Pol II Ser2p at the 3’ ends of DDX21-target genes (Fig. 4e). Total levels 
of Pol II at the same regions were also diminished, concordant with an 
elongation defect (Extended Data Fig. 9e). 

DDX21 promotes transcription in a manner dependent on its cata- 
lytic domain (Fig. 1j and Extended Data Fig. 2d) and we proposed that 
this feature would extend to its role in P-TEFb release. Notably, although 
DDX21*" is recruited to Pol II promoters, binds to 7SK and interacts 
with P-TEFb in the lysate (Extended Data Fig. 10a-—c), it shows marked 
differences in the 7SK ultraviolet crosslinking as compared to DDX21" 
with accumulation at the single-stranded region upstream from the fourth 
stem loop, involved in P-TEFb binding and inhibition”? (Extended Data 


Figure 4 | DDX21 promotes release of P-TEFb 
from the 7SK snRNP. a, DDX21™* iCLIP reads 
mapped to the 7SK snRNA. The four stem-loops 
(SL1-4) are marked below the iCLIP reads and 
are shadowed on the cartoon in c. b, ChIP-qPCR 
of DDX21 in control or 5’-7SK-antisense 
oligonucleotide (ASO)-treated HEK293 cells at 
representative Pol II-regulated, DDX21-target gene 
promoters, negative control regions (Neg), and 
the rDNA locus. Data are mean and s.d. For 
DDX21-target genes the difference between 
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e, ChIP-qPCR of Pol II Ser2p of representative 
Pol II-regulated genes at their transcriptional 
termination sites (TTS) in control or DDX21- 
siRNA-treated cells. Data are mean and s.d. of 
three independent experiments. f, Western blot 
analyses of a P-TEFb release assay with either 
DDX217 or catalytically impaired DDX21?®Y 
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Fig. 10c). These observations suggest that DDX21°"" recognizes 7SK, 


but is unable to remodel the RNA. Indeed, we find that both DDX21547 
and an additional catalytically defective mutant, DDX21?*Y (Extended 
Data Fig. 2c), failed to release P-TEFb from the inactive 7SK snRNP 
(Fig. 4f). Therefore, DDX21 requires its active helicase domain to drive 
Pol II-dependent transcription through release of P-TEFb and conse- 
quently promote transcriptional elongation. 

Collectively, we showed that DDX21 is incorporated into distinct 
snRNP complexes, 7SK snRNP and snoRNP (Extended Data Fig. 10d), 
to regulate transcriptional and post-transcriptional steps of ribosome 
biogenesis. Our data suggest that DD X21 is a key component in coordi- 
nating transcriptional programs across distinct nuclear compartments, 
as its engagement with chromatin is sensitive to the status of both Pol I 
and Pol II. We propose that through its multifaceted function in ribo- 
some biogenesis, DDX21 has a key role in regulating cellular growth in 
health and malignancy. 


Online Content Methods, along with any additional Extended Data display items 
and Source Data, are available in the online version of the paper; references unique 
to these sections appear only in the online paper. 
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METHODS 


HEK293 cells were cultured in DMEM plus 10% FBS and maintained under stan- 
dard tissue culture conditions. ChIP-seq and ChIP-qPCR analyses were conducted 
in HEK293 cells with commercially available antibodies as described. The nascent 
transcript assays have been described elsewhere*’. In brief, HEK293 cells stably 
expressing inducible transgenes for either DDX21" or DDX21°"" were cultured 
in tetracycline-free serum (GIBCO) and seeded at a density of 30-50% confluency 
overnight. Cells were transfected for two consecutive days with a pool of siRNA 
targeting the DDX21 3’ UTR (see later in the text). Twelve hours after the second 
transfection, 0.025 jig ml ' doxycycline (dox) was added to the media for an addi- 
tional 24h. At this point cells were treated for 1 h with either DMSO or flavopiridol 
(1.0 pg ml‘) to inhibit Pol II elongation. For ‘wash-off cultures, cells were washed 
3-4 times with pre-warmed media and allowed to recover transcription for either 
30 or 60 min. Cells were immediately collected in TRIzol and RNA was extracted 
as described later. For immunofluorescence studies cells were fixed in methanol 
and stained for the indicated antibodies. Small molecule treatments with DMSO, 
actinomycin-D, flavopiridol or CX-5461 were done for 1 h, unless otherwise specified. 
iCLIP was performed from HEK293 cells stably expressing Flag-HA-DDX21. The 
RNaseH cleavage assay was optimized from a previously reported method to cleave 
RNA at specific bases in a manner sensitive to the 2'-Ome status of the nucleotide 
of interest”. For the P-TEFb release assay, the inactive 7SK snRNP complex was puri- 
fied by immobilizing HEXIM1, followed by incubation with biochemically purified 
Flag~-HA-DDX21. The predicted 7SK structure has been published elsewhere’’. 
Cell lines. HEK293 and HeLa cells were obtained from ATCC and grown under 
standard conditions in DMEM plus 10% FBS supplemented with antibiotics. All cell 
lines used in this study are mycoplasma-free. For generating stable HEK293 cells 
expressing Flag-HA-DDX21 (FH-DDX21) full-length DDX21 was amplified from 
complementary DNA and cloned into a dox-inducible pTrip-Flag-HA lentiviral 
vector. The cloning strategy for generating DDX21?"¥ and DDX21S“" has been 
previously described’*. HEK293 cells were infected and several clones were expanded 
for further analyses and tested for mycoplasma. For the iCLIP experiments, expres- 
sion of the DDX21 transgenes was achieved by addition of dox (0.025 1g ml” ') to 
the media. Cells were collected 24h after the addition of dox. 

ChIP-qPCR and ChIP-seq. ChIP assays were performed as previously described**”. 
In brief, HEK293 cells were cross-linked with 1% formaldehyde for 10 min at room 
temperature and quenched with glycine to a final concentration of 0.125 M for 
another 10 min. Chromatin was sonicated with a Bioruptor (Diagenode), cleared 
by centrifugation, and incubated overnight at 4 °C with 5-7 1g of the desired anti- 
bodies: anti-DDX21 (Novus Biologicals BP100-1781 and NBP1-83310), anti-Pol 
II Ser2p (Active Motif 61084), anti-Pol II (Santa Cruz Biotechnology sc-899), anti- 
CTCF (Cell Signaling 2899) anti-CDK9 (1:1 mix of Santa Cruz Biotechnology sc- 
8338 and sc-484). Immunocomplexes were immobilized with 100 il of protein-G 
Dynal magnetic beads (Life Technologies) for 4 h at 4 °C, followed by stringent washes 
and elution. Eluates were reverse cross-linked overnight at 65 °C and deproteinated 
with proteinase K at 56 °C for 30 min. DNA was extracted with phenol chloroform, 
followed by ethanol precipitation. ChIP-seq libraries were prepared according to the 
NEBNext protocol and sequenced using Illumina HiSeq 2500. ChIP-qPCR analyses 
were performed in a Light Cycler 480II machine (Roche). ChIP-qPCR signals were 
calculated as percentage of input. Fold induction was calculated over a negative geno- 
mic region. All primers used in qPCR analyses are shown in Supplementary Table 1. 
All ChIP antibodies have been previously validated unless otherwise specified. 
ChIP-seq analyses. Sequences were mapped using DNAnexus software tools and 
analysed by QuEST and MACS2. For QuEST, ChIP-seq peaks were determined 
using a kernel density estimate bandwidth of 30, a ChIP candidate threshold of 20, 
a ChIP extension fold enrichment of 3, and a ChIP-to-background fold enrichment 
of 3. WIG files were generated with QuEST and used for visualization in the UCSC 
Genome Browser and for obtaining average signal profiles. Average ChIP-seq signal 
profiles around the centre of DDX21 ChIP-seq peaks were generated with the Sitepro 
tool, which is part of the Cistrome/Galaxy pipeline. We used the HOMER software 
to associate DDX21 ChIP-seq peaks to different genomic features. Functional anno- 
tation and Gene Ontology categories were obtained with GREAT™. For ascribing 
DDX21 binding to snoRNA-host genes, we generated a file containing all snoRNAs 
and their associated host genes and only snoRNAs residing within introns of RefSeq 
genes were used in this study. To avoid redundancies all entries were inspected 
manually in the UCSC genome browser. 

All genomic data sets have been deposited under the GEO record GSE56802. 
Other data sets used in this study were obtained from GSM891237, GSE36620, 
GSM1249888, GSM1249889, GSE20598, GSM1249897 and GSE20598. 

For mapping DDX21 ChIP-seq to the rDNA, we obtained the DNA consensus 
sequence of the 43-kb ribosomal locus NCBI (GeneBank ID: U13369.1). This 43 kb 
itself is unique relative to other locations in the human genome; however, as noted, 
it is repeated hundreds of times in each of the 5 chromosomal clusters. Using this 


unique 43-kb region, we used the Bowtie algorithm to map ChIP-seq reads with 
standard parameters used for mapping to the Hg19 human genome build. The same 
strategy has been employed by other groups to map transcription factors ChIP reads 
to the rDNA locus’. 

RNA extraction and qRT-PCR. RNA was isolated using Trizol (Life Technolo- 
gies) according to the manufacturer's protocol. All RNA samples were DNase-treated 
with the Turbo DNA-Free kit (Ambion). cDNA was generated using SuperScript 
VILO (Life Technologies) according to manufacturer instructions. qPCR analyses 
were performed on the Light Cycler 480II (Roche). All primers used are shown in 
Supplementary Table 1. 

Immunofluorescence. HEK293 cells were seeded into 24-well plates containing 
12-mm glass coverslips and cultured for 16 h in DMEM containing 10% FBS (v/v). 
Cells were then treated for the indicated drugs (refer to the corresponding figure 
legends for drug concentration and time scale of the experiment). Cells were fixed 
in 4% paraformaldehyde for 10 min at room temperature, 3 X 5-min washes with 
PBS, followed by an ice-cold methanol fix for 2 min and 2 X 5-min washes with PBS. 
Cells were permeabilized in PBS containing 0.3% (v/v) Triton X-100 for 5 min, and 
blocked overnight at 4 °C in PBT buffer (PBS with 1% BSA, 0.1% Triton X-100 (v/v), 
0.05% sodium azide (w/v)). After blocking, coverslips were incubated in PBT with 
the corresponding antibody. For DDX21 (Novus Biologicals NBP1-83310) the anti- 
body was diluted 1:200 and incubated at room temperature for 2 h. For fibrillarin 
(Cell Signaling C13C3) the antibody dilution was 1:100 and incubated at room 
temperature for 2h. Coverslips had 3 X 5-min washes with PBT and incubated 
with the Alexa-Fluor 568 secondary antibody (1:1,000; Life Technologies) for 1h. 
Cells were washed 3 X 5 min with PBT, 2 X 5 min with PBS, rinsed briefly with water 
and mounted onto glass slides using VECTASHIELD mounting medium with DAPI. 
All images were taken and processed using a Zeiss LSM700 confocal microscope. 
Western blots and co-immunoprecipitation. HEK293 nuclear extracts were pre- 
pared as described previously”. For immunoprecipitations, extracts were incubated 
overnight with 3 jg of the desired antibody pre-bound to protein G-sepharose 
(Pierce). In some case protein extracts were treated with RNaseA (20 jig ml~ 1) Immu- 
nocomplexes were eluted in 2X Laemmli buffer and resolved in an 8% acrylamide 
gel. For western blots the following antibodies were used according to manufac- 
turer instructions: anti- NOP58 (Bethyl A302-718A); anti-fibrillarin (Cell Signaling 
C13C3); anti-DKC1 (Gene Tex GTX109000); anti-Flag (Sigma); anti-DDX21 (Novus 
Biologicals NB100-1781); anti-LARP7 (a gift from D. H. Price); anti-CDK9 (Santa 
Cruz Biotechnology sc-484); anti-cyclinT 1 (Santa Cruz Biotechnology sc-10750); 
and anti- HEXIM1 (Bethyl A303-113A). All antibodies have been previously vali- 
dated unless otherwise specified. 

iCLIP and data analysis. The iCLIP method was performed as described before 
with the specific modifications below'*”*. Twenty-four hours after dox treatment 
(0.025 pg ml 1) FH-DDX21” or FH-DDX21°“! HEK293 cell lines were ultravi- 
olet crosslinked to a total of 0.3J cm” *. Whole-cell lysates were generated in CLIP 
lysis buffer (50 mM HEPES, 200 mM NaCl, 1 mM EDTA, 10% glycerol, 0.1% NP- 
40, 0.2% Triton X-100, 0.5% N-lauroylsarcosine) and briefly sonicated using a 
probe-tip Branson sonicator to solubilize chromatin. Each iCLIP experiment was 
normalized for total protein amount, typically 2 mg, and partially digested with 
RNaseA (Affymetrix) for 10 min at 37 °C and quenched on ice. FH-DDX21 was 
isolated with anti-Flag agarose beads (Sigma) for 3 h at 4°C on rotation. Samples 
were washed sequentially in 1 ml for 5 min each at 4 °C: 2X high stringency buffer 
(15 mM Tris-HCl, pH 7.5, 5mM EDTA, 2.5mM EGTA, 1% Triton X-100, 1% 
sodium deoxycholate, 120 mM NaCl, 25 mM KCl), 1X high salt buffer (15 mM 
Tris-HCl pH7.5, 5mM EDTA, 2.5mM EGTA, 1% Triton X-100, 1% sodium 
deoxycholate, 1 M NaCl), 1X NT2 buffer (50 mM Tris-HCl, pH 7.5, 150 mM NaCl, 
1mM MgCh, 0.05% NP-40). Purified FH-DDX21 was then eluted off anti-Flag 
agarose beads using competitive Flag peptide elution. Each sample was resuspended 
in 500 ul of Flag elution buffer (50 mM Tris-HCl, pH 7.5, 250 mM NaCl, 0.5% NP- 
40, 0.1% sodium deoxycholate, 0.5 mg ml”! Flag peptide) and rotated at 4 °C for 
30 min. The Flag elution was repeated once for a total of 1 ml elution. FH-DDX21 
was then captured using anti-haemagglutinin agarose beads (Pierce) for 1 hat 4°C 
on rotation. Samples were then washed as previously in the anti-Flag agarose beads. 
3'-end RNA dephosphorylation, 3’-end single-stranded RNA ligation, 5’ labelling, 
SDS-PAGE separation and transfer, autoradiograph, ribonucleoprotein isolation, 
proteinase K treatment, and overnight RNA precipitation took place as previously 
described’’. The 3’-end single-stranded RNA ligation adaptor was modified to con- 
tain a 3’ biotin moiety as a blocking agent (Supplementary Table 1). The iCLIP 
library preparation was performed as described elsewhere (R.A.F., L.M., R.C.S. and 
H.Y.C., unpublished observations). Final library material was quantified on the 
BioAnalyzer High Sensitivity DNA chip (Agilent) and then sent for deep sequenc- 
ing on the Illumina NextSeq machine for 1 X 75-bp cycle run. iCLIP data analysis was 
performed as previously described’. For analysis of repetitive non-coding RNAs, 
custom annotation files were built from the Rfam database and reads were mapped 
under standard iCLIP processing steps. For the repetitive RNA analysis we normalized 
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the iCLIP reverse transcription stops across each RNA transcript, which allows the 
comparison of the shape of the profile, with this normalization, the y axis is infor- 
mative only for the relative binding preference and does not imply the strength of 
binding. 

siRNA and antisense oligonucleotide knockdown. For DDX21 knockdown, 
HEK293 cells (2.5 X 10°) were transfected in DMEM supplemented with 5% FBS 
without antibiotics using RNAiMAX (Life Technologies). DDX21 (20 nM) or con- 
trol siRNA was used in this study. Notably, DDX21 depletion was difficult to achieve, 
thus for efficient DDX21 knockdown (60-80% at the protein level) three consecu- 
tive siRNA transfections were required. siRNA diced pools were generated in 
J.W.’s laboratory using recombinant Giardia lamblia Dicer. 

ASO depletion of 7SK was performed as previously described”*. In brief, HEK293 
cells (2 X 10°) were nucleofected with the Amaxa Nucleofector 2b (Lonza) with 
1 nmol of scramble control, 5’-7SK, or 3’-7SK (ref. 26). Cells were cultured for 
12h at 37 °C and collected for ChIP-qPCR as described above. A fraction of each 
sample was collected with Trizol for assaying 7SK knockdown by qRT-PCR as 
described above. 

Site-directed RNaseH rRNA cleavage assay. HEK293 cells were seeded and trans- 
fected with DDX21 or control siRNA as described above. Twelve hours after the 
second siRNA transfection, DDX21™? or DDX21°47 cDNAs were induced by add- 
ing 0.25 pg ml’ dox, after which cells were collected by scrapping the monolayer 
in ice-cold PBS. Total cellular RNA was isolated by Trizol extraction and RNeasy 
column clean up. To evaluate the fraction of methylation of specific nucleotides 
within the rRNA quantitatively, we optimized an established assay using RNaseH 
to cleave unmethylated RNA selectively . Several sites were selected based on iCLIP 
reverse transcription stops to both the rRNA region of interest as well as reverse 
transcription stops on the targeting snoRNA transcript. Chimaeric 2’-Ome/DNA 
oligonucleotides were ordered for three sites (Supplementary Table 1) each con- 
taining three DNA nucleobases for RNaseH targeting. For each cleavage reaction 
1 pg of total RNA was mixed with 0.5 pmol of a specific chimaeric oligonucleotide 
(final volume of 6 11) and annealed to the rRNA by incubating at 80 °C for 2 min 
and then step-cooling the sample to 25 °C, decreasing the temperature 1 °C per second. 
Then 2.5 pl of 4X RNaseH cleavage buffer (80 mM Tris-HCl, pH 7.5, 40 mM MgCh, 
400 mM KCl, 0.4 mM dithiothreitol, 20 mM sucrose), 1 ttl of RNaseH (Roche, note: 
RNaseH from this supplier is critical as other suppliers or isolates have different 
specificity for site-directed cleavage), and 1 jl of SUPERaseln (Life Technologies) were 
added to each reaction and incubated for 25 min at 37 °C. Samples were subsequently 
purified using RNeasy columns and eluted in 100 jl of water. For visualization and 
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quantification, 5 pl of each sample was mixed with 5 pil of GLBII (Life Technolo- 
gies) and heated to 65 °C before agarose gel electrophoresis. Full-length and cleav- 
age products were imaged and quantified on a ChemiDoc XRS+ (BioRad). 
P-TEFb release assay. The P-TEFb release assay was performed as previously 
described” with some modifications. Five micrograms of anti- HEXIM1 (ab25388) 
antibody was pre-bound to protein A Dynal magnetics beads (Life Technologies) 
and incubated with 2.5 mg of HeLa cell nuclear extracts to immobilize the inactive 
7SK snRNP complex. The resulting immunocomplexes were incubated with increas- 
ing amounts of purified Flag-DDX21 and incubated for 2h on ice. A magnetic 
separator was used to sequester the remaining HEXIM1-bound 7SK snRNP, and 
the resulting eluates were collected and analysed by western blotting. For purifi- 
cation of Flag~-DDX21, nuclear extracts from HEK293 cells stably expressing Flag- 
DDX21 were prepared using a modified version of ref. 35. Purified nuclei were 
extracted in Dignam and Roeder buffer C and cleared by centrifugation. The salt 
concentration of the cleared extracts was adjusted to 250 mM with Dignam and 
Roeder buffer D. The resulting nuclear extracts were incubated with Flag—M2 aga- 
rose beads (Sigma) for 2 h to immobilize Flag~DDX21, followed by stringent washes: 
5 X 2 min with Dignam and Roeder buffer C at an NaCl concentration of 500 mM 
and 1 X 5 min with buffer D-C (20 mM HEPES, 20% glycerol, 0.1 mM EDTA, 150 mM 
NaCl, 0.75 mM MgCl, and 0.05 M KC]). Flag~-DDX21 was eluted with the 3 x Flag 
peptide (Sigma) in buffer D-C. 
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Extended Data Figure 1 | DDX21 associates with non- and protein-coding 
ribosomal genes. a, MEME analysis of DDX21-bound regions defined by 
DDX21 ChIP-seq. Motif logo, annotated transcription factor, number of motif 
instances within the ChIP-seq regions, Z score, and P value for each motif 
are shown. b, DDX21 ChIP-qPCR from HeLa cell chromatin extracts with 
primers spanning a representative number of loci found to be enriched in the 
DDX21 ChIP-seq analyses from HEK293 cells. Data are mean and s.d. of three 
independent experiments. c, Comparison of DDX21 (this study) and 
H3K4me3 (publically available data, see Methods for accession numbers) 
ChIP-seq-bound regions. 2,863 regions are common between the data sets, 
505 regions are unique to DDX21, and 11,403 regions are unique to the 


H3K4me3 data set. d, Gene Ontology terms for H3K4me3 regions that are 
either DDX21-bound (left) or not bound by DDX21 (right). e, Box plots 
representing the expression levels of snoRNA-host genes whose promoter 
regions are either bound or not by DDX21. As shown, snoRNA-host gene 
promoters bound by DDX21 are, on average, more highly expressed than 
those not occupied by DDX21. Fragments per kilobase of exon per million 
mapped reads (FPKM) values were taken from publically available HEK293 
RNA-seq data (see Methods for accession number). The P value (P = 0.05) was 
calculated using the Wilcoxon signed-rank test. f, g, UCSC genome browser 
tracks depicting DDX21 ChIP-seq and iCLIP-seq, and RNA-seq enrichment 
profiles at differentially expressed snoRNA-host genes in HEK293 cells. 
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Extended Data Figure 2 | DDX21 positively regulates transcription of Pol I- 
and Pol II-dependent ribosomal genes. a, siRNA-mediated knockdown of 
the DDX21 antibody used for ChIP. We transfected HEK293 cells with two 
different sets of siRNAs targeting endogenous DDX21 mRNA (siRNA1 and 
siRNA2 (3’ UTR)) and performed western blots with the indicated antibodies. 
As shown, the DDX21-specific band is diminished in cells transfected with 
DDX21-targeting siRNAs, but not with control siRNAs. Actin was used as a 
loading control for this experiment. b, RT-qPCR analysis assessing the RNA 
expression levels of the same genes analysed in Fig. 1h upon DDX21 
knockdown by a second siRNA that targets the 3’ UTR of DDX21 mRNA. 
Data are mean and s.d. of three independent experiments. For DDX21-target 
genes the difference between control and DDX21 siRNA is significant, 


P=0.05 (Student's t-test). c, Diagram of DDX21 protein domains. The two 
conserved RecA-like (A and B) domains and the GUCT domains are shown in 
green and blue, respectively. Amino acids targeted for mutation”” to convert 
DDX21” into DDx215“ , the ATP-hydrolysis mutant, are indicated with red 
and purple lines in the diagram. Specific amino acid changes are displayed 
below. d, RT-PCR analysis assessing nascent unspliced mRNA levels from 
additional DDX21-target and DDX21-non-target promoters. For a detailed 
description see Fig. 1j. Data are mean and s.d. of three biological replicates. 

e, Nuclear rRNA abundance analysis by RNA BioAnalyzer of HEK293 cells 
depleted of DDX21 and rescued with DDx21”", Dpx2154", or DDX21?£Y. 
For each analysis, total RNA was isolated from 1,500,000 nuclei. Total 
nanogram amounts are shown for each of the two large rRNA subunits. 
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Extended Data Figure 3 | Selective inhibition of Pol I alters DDX21 nuclear 
localization and chromatin association. a, b, Immunofluorescence images 
of methanol-fixed HEK293 cells after 1 h incubation with either DMSO or 2 uM 
of the specific Pol I inhibitor CX-5461. DDX21 (a) and fibrillarin (b) immuno- 
labellings are shown. Scale bars, 10 um. ¢, d, ChIP-qPCR analyses from 
HEK293 sampling DDX21 genomic occupancy, at the rDNA locus (c) and at a 
representative panel of Pol II-regulated gene promoters (d), after treatment 
with DMSO or CX-5461. Data are mean and s.d. of three independent 
experiments. As displayed, inhibition of Pol I alters DDX21 nuclear localization 
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and this coincides with nearly complete eviction of DDX21 from Pol I- and 
Pol II-regulated genes. e, f, ChIP-qPCR analyses from HEK293 cells treated 
with 50ngml * of actinomycin-D for 1h. Binding of the transcriptional 
repressor CTCF across the rDNA locus (e) and the c-MYC insulator element 
(MINE) (f) demonstrates that actinomycin-D treatment does not effect 
CTCF binding to chromatin. Red arrow indicates relative location 

of the CTCF DNA-binding site (DBS) at the rDNA locus. Data are mean 
and s.d. of three independent experiments. 
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Extended Data Figure 4 | DDX21 nuclear re-localization is preferentially absence of serum. For cellular respiration inhibition, cells were treated for with 
sensitive to acute transcriptional inhibition over other cellular stressors. either oligomycin (100 1M) or 2-deoxy-D-glucose (10 mM) for 1h. To inhibit 
Immunofluorescence analyses of HEK293 cells after targeting different the mTOR pathway, cells were treated with 250 nM of either Torin 1 or 


metabolic pathways. For inhibition of mitogen, cells were starved for 16hinthe rapamycin for 2h. 


©2015 Macmillan Publishers Limited. All rights reserved 


LETTER 


snoRNA Comparison 


rRNA Comparison 


a Cc 
Antibody _ IgG + HA FLAG +HA re 
RNaseA(ng) 11. 11 55 11 #11 55 ante 
UV (254nm) - + + - + + 


2257 
150— 


102— 


Normalized iCLIP count per snoRNA 
(comparison dataset) 


Normalized iCLIP count at rDNA locus 
(comparison dataset) 


0.0000 } 


5’32p RNA 
WB:IP, 
anti-HA 


~~ ve» WB:Input, 
anti-HA 


Normalized iCLIP count per mRNA 


pines CiCLIP 
2 ICLIP . 
o 
Tt 
= 
o 


(comparison dataset) 
Oo 
& 
i=) 
= 


° 
3 
i) 


a Oo +t D Oo Is 

oo oc 8 2 @ 

ooeeg 5S 8 )!| 2 
o So 


0.001 


o fo) oo o 


- 0.0005}-- 
0.0010} 
0.0015 
0.0020 | 
0.0025}... 
0.0030} 
0.0035 | 


Normalized iCLIP count at rDNA locusv (DDX21_rep1) 


1.05 mrt DDxo1 (+) UV, 
M@FH-DDX21 (+) UV, rep2 


Sd 
ron) 
1 


Fraction Recovered (% of = 


0.64 
524 0.005 
F ee ee WB:Input, 0.4- 
38— 3 anti-Actin 
T P2008 0.24 
kDa 8 8 2 2 q 
o o oO o oO 
S 3 3 3 S a0 
Nofiialized ICLIP'count per mRNA (Dx21 step) ©" scaRNA2'scaRNA0" snorD15B° snorA62 TERC ut 
b UV-C Crosslinking 
; FLAG IP 
Cell lysis and 1M NaCl + 1% Na-DOC 
partial RNaseA digestion + 1% TritonX100 wash 
——_—_—_— 
FLAG Peptide 
HA IP Elution 
Library Prep SDS-PAGE, Transfer, 1M NaCl + 1% Na-DOC 
Illumina Sequencing and ProtK Digestion + 1% TritonX100 wash 
ICLIP Data ArnalySis qe Sa ———___— ——_—______—_. 


Extended Data Figure 5 | Tandem affinity iCLIP of FH-DDX21"". 

a, FH-DDX21? iCLIP * ?P-autoradiogram and western blots. All samples 
were loaded with constant input lysate amounts (actin loading). FH-DDX21* 
was isolated from HEK293 cells induced to express the transgene for 24h 
and crosslinked with ultraviolet light (top panel same as Fig. 3a). b, Schematic of 
the modified iCLIP procedure. To achieve high stringency and specificity Flag- 
HA-DDX21™" is first purified on anti-Flag-M2 agarose beads, washed with 
1M NaCl, 1% Triton X-100 and 1% sodium deoxycholate. Complexes are 
specifically eluted with Flag peptide and recaptured with anti-HA agarose. 
Standard iCLIP steps were performed thereafter to generate deep sequencing 
libraries. c—e, Scatter plot analysis of iCLIP reverse transcription stops on 


snoRNAs, rRNA and mRNAs within the FH-DDX21” (this study) and 
hnRNP-C (ref. 16; publically available data) data sets. Little concordance 
between the data sets is evident, suggesting specific transcriptome targets of 
these two RNA binding proteins (RBPs). f, DDX21" ultraviolet RNA 
immunoprecipitation qRT-PCR of FH-DDX21" was performed in three 
conditions: native HEK293 cells crosslinked with ultraviolet light; FH- 
DDX217 HEK293 cells without crosslinking; and FH-DDX21“7 HEK293 
cells with ultraviolet crosslinking. smoRNAs, scaRNAs and TERC were 
validated targets identified in the sequencing data. Each experiment was 
performed in biological duplicates (rep1 and rep2) and error bars represent s.d. 
of technical triplicates. 
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Extended Data Figure 6 | Tandem affinity iCLIP of FH-DDX21°*". 

a, FH-DDX21°“" was isolated from HEK293 cells induced to express the 
transgene for 24h, at which point we did not observe significant dominant 
negative effects. iCLIP was performed as described for FH-DDxX21™7, and 
biological duplicates of FH-DDX21°*" iCLIP **P-autoradiogram and western 
blots (lanes 2 and 3) are shown. All samples were loaded with constant input 
lysate amounts (actin loading). FH-DDX21" was loaded as a control. WB, 
western blot. b, Left, DDX21°“7 iCLIP reads annotated to known repetitive 
(rRNA and snRNAs) and non-repetitive (hg19 genome build: mRNAs and 
snoRNAs) regions of the human genome. Categories are notes with their 
respective percentage of the total iCLIP experiment. Right, enriched Gene 
Ontology and KEGG pathway terms from DDX21°**-bound mRNAs obtained 
using the DAVID tool. The x axis values (in log scale) correspond to the 
negative Benjamini P value. c, Distribution of all DDxX21°“-bound snoRNAs, 


representing C/D box, H/ACA box and scaRNAs. The number (1) and fraction 
(per cent) of each snoRNA type is displayed. d, Comparison of the snoRNAs 
bound by DDX21“7 and DDX21°*", revealing significant overlap between 
the active and catalytically inactive DDX21. e, DDX21°“7 iCLIP reads mapped 
to the transcribed region of the rDNA. f, DDX217 (left) and DDX215“ 
(right) iCLIP reads mapped to the repetitive U3 snoRNA. Binding is 
represented as reverse transcription stops per nucleotide normalized to the 
total number of reverse transcription stops mapping to the U3 snoRNA. 
Two strong binding sites are evident between bases 25-40 and 175-185 of 
U3 in DDX21™" iCLIP, whereas the 5’ binding site is reduced in ppx21*“". 
nts, nucleotides. g, RT-PCR analysis assessing the expression levels of 
several snoRNAs 24h after expression of either DDX21“" or DDX21°“7. 
This experiment was performed in biological duplicates. Data are mean and s.d. 
of technical triplicates. 
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Extended Data Figure 7 | DDX21 functionally interacts with snoRNAs and 
the snoRNP. a, UCSC genome browser view of DDX21™" iCLIP reads across 
the snorD66 snoRNA. The C box [C] and D box [D] regions are highlighted 
in red. b, Same visualization as in a but showing the snorA67 snoRNA with the 
H box [H] and ACA box [ACA] regions highlighted. c, Immunoprecipitation 
of NOP58 from HEK293 nuclear extracts confirms DDX21 as a protein 
member of the snoRNP machinery. As a control for this experiment we 
performed western blots against FBL, a well-known NOP58-interacting partner 
and an essential factor of the snoRNP machinery. d, DDX21 interacts with 
XRN2, a 5'-3' exoribonuclease required for maturation and processing of 
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snoRNAs. The DDX21—XRN2 interaction appears to be bridged by RNA, 

as treatment of the nuclear lysates with RNaseA abolishes the interaction. 

e, Schematic of the site-directed RNaseH cleavage of RNA sensitive to 2'-Ome. 
RNA of interest is hybridized to a 2’-Ome/DNA chimaeric oligonucleotide in 
which the DNA nucleotides specifically target the ability of RNaseH to 
interrogate the 2’-Ome status of a single nucleotide. 2'-Ome will inhibit 
RNaseH and leave intact RNA, while unmethylated RNA will be cleaved. 

f, UCSC genome browser view of DDX21“" and DDX215“7 iCLIP reads across 
the snoRNAs responsible for guiding the modifications tested in Fig. 3f. 
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Extended Data Figure 8 | Association of DDX21 with the RNA and protein _ transcription stops identified from the DDX21" (blue) and DDX21*47 
components of the 7SK snRNP. a, Comparison of DDX21“" ChIP-seq (orange) experiments. Nucleotides commonly crosslinked are labelled in green. 
targets to DDX21“" iCLIP-bound mRNAs. The numbers of unique and Known RNA binding protein sites: HEXIM1/2 is highlighted in purple; P-TEFb 
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not immunoprecipitated by iCLIP but that some are recovered in both assays. _ highlighted in green. e, Co-immunoprecipitation analysis of HEXIM1 

b, iCLIP read distribution of DDX217-target mRNAs categorized by the as assayed by western blotting for DDX21" and 7SK snRNP components 
regions within mRNAs that were bound. Most iCLIP reads fell outside the 5’ © (CDK9 and LARP7). f, Immunoprecipitation of Flag-HA-DDX21" from 
UTR.c, DDX21" iCLIP reads mapping to short repetitive RNAs ofthehuman HEK293 nuclear extracts confirms DDX21 asa protein component of the 7SK 
genome. Percentages of the top four short repetitive RNAs are shown. snRNP through co-recovery of LARP7. The abundant protein actin, which is 
d, Secondary structure model of the 7SK snRNA annotated with iCLIP reverse _ not part of the 7SK snRNP, was not recovered. 
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Extended Data Figure 9 | Binding of the DDX21-7SK snRNP at ribosomal —_ when compared to control ASO. d, Gene Ontology molecular function and 
gene promoters. a, b, ChIP-qPCR of CDK9 (a) and HEXIM1 (b) in HEK293 cellular component analysis of publically available CDK9 ChIP-seq data*®. 
cells at representative Pol II-regulated, DDX21-target and -non-target gene e, ChIP-qPCR of total Pol II in control or DDX21-targeting siRNA-treated 
promoters, negative control regions, and the rDNA locus. c, ChIP-qPCR of HEK293 cells at representative TSSs of Pol II-regulated, DDX21-target gene 
DDX21 in control or 3'-7SK-ASO-treated HEK293 cells at representative Pol promoters and negative control regions. Data are mean and s.d. of three 
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rDNA locus. For the promoter-associated genes, P = 0.05 (Student’s t-test) 


©2015 Macmillan Publishers Limited. All rights reserved 


20 7 FH-DDX21 ChIP-qPCR 
18 DDX21“T 


DDX21S*° 


as ok 


= 
ONADWONAD 


ChIP-qPCR enrichment 
(Fold over negative) 


_) 
> 
ge 
iS) 


2 p> ay © 1 N % 
ie iv 2 . g 2 key 
eS Ss € & & wo we 


Promoter-associated genes (TSS) 


Fraction of RT Stops (on 7SK) 


Stem Loop 1. Stem Loop 2 


snoRNP 
complex 


Nucleus 


7SK snRNP 
complex 78K 


tss 6 


Extended Data Figure 10 | Catalytically inactive DDX21 is still 
incorporated into the 7SK snRNP. a, ChIP-qPCR of DDX21™ (black) and 
DDX21°7 (green) in HEK293 cells at representative TSSs of Pol II-regulated, 
DDX21-target gene promoters and negative control regions. Data are mean 
and s.d. of three independent experiments. b, Immunoprecipitation of 
DDX217, DDX21?¥Y or DDX21°“7 from HEK293 nuclear extracts confirms 
DDX21 interacts with CDK9 (P-TEFb) regardless of its catalytic activity. 

c, DDX21" (blue) and DDX21°47 (orange) annotated iCLIP reads mapped 
across the 7SK snRNA. The four annotated stem-loops are marked below the 
graph. d, Model of multi-level control of ribosomal pathway by DDX21. In 
the nucleolus, DDX21 associates with the chromatin across the transcribed 
region of the rDNA and is a component of the snoRNP. Furthermore, DDX21 
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functionally interacts with the rRNA, snoRNAs and snoRNP to control 2’-Ome 
deposition on the rRNA in a helicase activity-dependent manner. In the 
nucleoplasm, DDX21 is bound to the promoter regions of ribosomal Pol II- 
transcribed genes, many of which contain precursor snoRNA transcripts. 
Mechanistically, DDX21 activates transcription of its target genes through the 
7SK-P-TEFb axis. As part of the 7SK snRNP, DDX21 can facilitate the release 
of P-TEFb from the inhibitory complex in a manner dependent on ATP 
hydrolysis, leading to productive Pol II elongation and increased 
phosphorylation of Ser 2. Efficient transcription of its target genes enforces high 
expression of both snoRNAs and other ribosomal proteins critical for the rRNA 
maturation process, placing DDX21 as a central operator of the ribosomal 
pathway. 
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Mammalian polymerase 8 promotes alternative NHEJ 
and suppresses recombination 


Pedro A. Mateos-Gomez', Fade Gong’, Nidhi Nair*, Kyle M. Miller, Eros Lazzerini-Denchi* & Agnel Sfeir! 


The alternative non-homologous end-joining (NHEJ) machinery facil- 
itates several genomic rearrangements, some of which can lead to 
cellular transformation. This error-prone repair pathway is triggered 
upon telomere de-protection to promote the formation of deleterious 
chromosome end-to-end fusions’ *. Using next-generation sequenc- 
ing technology, here we show that repair by alternative NHEJ yields 
non-TTAGGG nucleotide insertions at fusion breakpoints of dys- 
functional telomeres. Investigating the enzymatic activity respon- 
sible for the random insertions enabled us to identify polymerase 
theta (Pol®@; encoded by Polg in mice) as a crucial alternative NHEJ 
factor in mammalian cells. Polq inhibition suppresses alternative 
NHBE) at dysfunctional telomeres, and hinders chromosomal trans- 
locations at non-telomeric loci. In addition, we found that loss of 
Polq in mice results in increased rates of homology-directed repair, 
evident by recombination of dysfunctional telomeres and accumu- 
lation of RAD51 at double-stranded breaks. Lastly, we show that deple- 
tion of Pol@ has a synergistic effect on cell survival in the absence of 
BRCA genes, suggesting that the inhibition of this mutagenic poly- 
merase represents a valid therapeutic avenue for tumours carrying 
mutations in homology-directed repair genes. 

Chromosome end-to-end fusions are inhibited by shelterin; a multi- 
subunit complex anchored to telomeric DNA by two Myb-containing 
proteins—TRF1 and TRE2 (ref. 4). Telomere fusions are executed by two 
independent end-joining pathways. Classical non-homologous end- 
joining (C-NHE)J), mediated by LIG4 and the Ku70/80 heterodimer, 
is primarily blocked by TRF2 (ref. 5). Conversely, alternative NHEJ 
(alt- NHEJ), which is dependent on LIG3 (ref. 6) and PARP (ref. 7), is 
repressed in a redundant manner**. Alt-NHE is fully unleashed after the 
simultaneous deletion of TRF1 and TRE2, and the creation of shelterin- 
free telomeres in cells deficient for Ku70 and Ku80 (also known as Xrcc6 
and Xrcc5, respectively)”. This error-prone end-joining pathway medi- 
ates fusion of naturally eroded telomeres’, joining of switch regions 
during class-switch recombination’, and formation of chromosomal 
translocations in mouse cells”"°. 

To characterize the differences between C-NHE] and alt-NHEJ at 
dysfunctional telomeres, we determined whether the sequence of the 
junction between two fused telomeres differed depending on the type of 
repair pathway used. Telomere fusions by C-NHE] were triggered by Cre- 
mediated depletion of TRF2 using previously described mouse embry- 
onic fibroblasts (MEFs) (Trf2" “'Cre-ER')!! (Extended Data Fig. 1a). To 
induce robust fusions by the alt- NHE] pathway, we depleted the entire 
shelterin complex by deleting Trfl and Trf2 from Trf"” Trf2"”* Ku80-‘~ 
Cre-ER'* MEFs? (Extended Data Fig. 1a). DNA was subjected to next- 
generation sequencing, and reads corresponding to telomeres were iden- 
tified on the basis of the presence of at least three consecutive TTAGGG 
repeats. To detect rare reads containing fusion junctions, we exploited 
the novel sequence arrangement created by the ligation of the 3’ G-rich 
strand (TTAGGG-3’) to the 5’ C-rich strand (5’-CCCTAA) (Fig. 1a), 
and filtered reads that started with at least three G-rich repeats and ended 
with two or more C-rich repeats. We confirmed that this approach could 


successfully identify telomere fusions by comparing reads derived from 
Trf2-proficient and Trf2-deficient cells. Starting with a similar number 
of telomere-repeat containing reads, we identified >90 fusogenic events 
in Trf2-knockout MEFs, compared to only three events in wild-type 
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Figure 1 | Random nucleotide insertions at the junction of telomeres fused 
by alt-NHEJ. a, Schematic of the junction of a telomere fusion. The 3’ end 
of the telomeric G-rich strand of a chromosome (blue) is fused to the 5’ end of 
the C-rich strand of a different chromosome (red). b, Illumina sequencing to 
analyse telomere fusion junctions. Reads =3XTTAGGG consecutively were 
scored as derived from telomere fragments. Those with =3XTTAGGG on the 
5’ end and =2XCCCTAA at the 3’ end were scored as telomere fusion 
junctions (see Supplementary Information). c, Examples of telomere fusions 
generated by C-NHE] of TRF2-depleted telomeres. Light grey highlights fusion 
junctions, dark grey marks the flanking telomere repeats. d, Examples of 
insertions in shelterin-free Ku80-null MEFs. e, Telomere fusions in metaphase 
spreads from Trf1’” Trf2"" Ku80’~ Cre-ER™” MEFs. Telomeres in red 
(peptide nucleic acid (PNA) probe) and chromosomes in blue (4’ ,6-diamidino- 
2-phenylindole; DAPI). f, Frequency of telomere fusions after the depletion 
of candidate polymerases. 4-OHT, 4-hydroxytamoxifen; shCtrl, control 
shRNA. Bars represent mean of n > 1,000 chromosome ends derived from 
one experiment. 
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cells (Fig. 1b). Sequence analysis of the junctions highlighted different 
permutations of TTAGGG/AATCCC sequences. Notably, the spec- 
trum of the fusion junctions was different in shelterin-free settings, in 
which frequent non-telomeric nucleotide insertions (9 out of 46 events) 
were identified at fusion breakpoints (Fig. 1b-d and Supplementary 
Information). 

To identify the enzyme that incorporated nucleotides at dysfunctional 
telomeres, we depleted known low-fidelity DNA polymerases in shelterin- 
free cells lacking Ku80, and analysed chromosome-end fusions on meta- 
phase spreads. Notably, we observed a reduction in the frequency of 
telomere fusions in cells with reduced levels of polymerase theta (Pol, 
encoded by Polq in mice) (Fig. le, f and Extended Data Fig. 1b). The 
activity of Pol is specific to alt- NHEJ as its inhibition in T7f2-knockout 
cells did not affect the frequency of C-NHE)J (Fig. 2a, b and Extended 
Data Fig. 2a-c). 

Pol@ is an A-family DNA polymerase that exhibits low fidelity on 
templated DNA”, and also displays a terminal transferase-like activity 
that catalyses nucleotide addition in a template-independent manner’?. 
The relevance of these activities in vivo was highlighted in Drosophila 
melanogaster, in which Pol@ was shown to stimulate nucleotide inser- 
tions during double-stranded break (DSB) repair by alt-NHEJ'*. More 
recently, Pol@ was shown to promote end-joining of replication-associated 
DSBs in Caenorhabditis elegans’, preventing large deletions around 
G-rich DNA”. The exact function of Pol@ during DSB repair in mam- 
malian cells remains elusive. 

The crucial role for Pol® at dysfunctional telomeres prompted us to 
test whether it is required for DSB repair at non-telomeric loci. To this 
end, we tested whether the depletion of Pol0 affects chromosomal trans- 
locations in the context of mouse pluripotent cells, reported to be mediated 
by alt-NHEJ’° in a LIG3-dependent manner’. To model chromosomal 
translocations, we induced DSBs in the Rosa26 and H3f3B mouse loci 
using the CRISPR/Cas9 system (Fig. 2c). When introduced into Polq*’* 
and Polq ’ ~ cells'”, the Cas9-gRNA(Rosa26;H3f3b) expression plasmid 
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Figure 2 | Pol@ is required for alt-NHEJ-dependent DSB repair in 
mammalian cells. a, Metaphases from TRF2-depleted (Trf2!”" Cre-ER™ + 
4-OHT) and shelterin-free (Trf1”” Trf2"" Ku80 ‘~ Cre-ER™? + 4-OHT) 
MEFs infected with the indicated short hairpin RNA (shRNA). 

b, Quantification of telomere fusions in MEFs with the indicated treatment 
(mean values + s.d. derived from six independent experiments. **P = 0.003; 
two-tailed Student's t-test). c, Design of the translocation assay in which DSBs 
are induced by Cas9-gRNA(Rosa26;H3f3b). Joining of DNA ends generates 
der(6) and der(11), detected by nested PCR’. d, Translocation frequency in 
Polq*’* and Polq”’~ cells 60 h after Cas9-gRNA(Rosa26;H3f3b) expression. 
Mean values = s.d. derived from three independent experiments. **P = 0.009; 
two-tailed Student's t-test. 
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induced simultaneous cleavage of both loci with comparable efficien- 
cies (Extended Data Fig. 2d, e). Consistent with previous reports, 24% 
of translocation events in Polq*’* cells were scarred by random inser- 
tions. Interestingly, the overall frequency of translocations in cells lack- 
ing Polq was significantly reduced (Fig. 2d). Sequence analysis of residual 
translocations in Polq’’~ cells highlighted the absence of insertions, 
and a concomitant decrease in micro-homology at junctions (Fig. 2d 
and Extended Data Figs 2-5). Notably, we observed similar results when 
assessing translocation frequency in cells expressing a catalytically inac- 
tive form of Pol® (Extended Data Fig. 2g—k). Altogether, our data indi- 
cate that the promiscuous activity of Pol@ during DSB repair contributes 
to the increased mutagenicity of alt-NHEJ. Importantly, our results 
indicate that mammalian Pol plays a critical part by stimulating the 
end-joining reaction. 

We next investigated the upstream signalling event(s) required for 
the recruitment of Pol@ to DNA damage sites, induced after micro- 
irradiation of HeLa cells expressing Myc-tagged Pol. Accumulation 
of Pol@ at laser-induced DNA breaks, discerned by its co-localization 
with the phosphorylated histone H2AX (y-H2AX), occurred in ~25% 
of cells that stained positive for Myc (Fig. 3a, b), and was independent 
of either ATM or ATR signalling (Extended Data Fig. 6). Instead, 
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Figure 3 r Pol is recruited by PARP1 to promote alt-NHEJ at the expense of 
HDR. a, Myc-Pol localization to DNA damage was monitored after laser 
micro-irradiation of HeLa cells. Cells were fixed and stained for y-H2AX and 
Myc, 1h after damage induction. b, Quantification of Pol@ accumulation at 
sites of laser damage (mean values + s.e.m. derived from two independent 
experiments). c, To test whether Pol represses recombination at telomeres, we 
depleted the polymerase in shelterin-free and Lig4-deficient MEFs’, and both 
repair pathways were monitored using CO-FISH. White arrows indicate alt- 
NHEJ events, red arrows highlight HDR-mediated T-SCEs. d, Quantification of 
telomere fusion (alt-NHEJ) and T-SCE (HDR) in cells transduced with 
shRNAs against Polq, Lig3 or control shRNA. Error bars denote + s.d. from 
three independent experiments. e, Immunofluorescence for RAD51 and 
y-H2AX in the indicated MEFs 3h after irradiation. f, Graph representing 
quantification of ionizing-radiation-induced RADS51 foci. Mean values + s.d. 
derived from three independent experiments. *P < 0.05, **P < 0.01; two-tailed 
Student’s t-test. 
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co-localization of Pol@ with y-H2AX was reduced after depletion of 
PARP! with short interfering RNAs (siRNAs), or after the inhibition 
of PARP1 activity (using KU58948) (Fig. 3a, b). Ina parallel experiment 
that used a recently developed U2OS-DSB reporter cell line’*, we were 
able to ascertain the localization of Myc—Pol® to bona fide DSBs, in- 
duced after FOK1 cleavage of a LacO-tagged genomic locus (Extended 
Data Fig. 7). In conclusion, our data suggest that PARP1, previously 
known to be required for alt- NHE)”””, facilitates the recruitment of Pol® 
to DSBs. 

Homology-directed repair (HDR) is prevalent during the S/G2 phase 
of the cell cycle, which coincides with the peak of alt-NHE] activity, and 
these pathways also share the initial resection step mediated by MRE11 
and CtIP”®. To test whether inhibiting alt- NHEJ could potentially result 
in increased HDR, we depleted shelterin in Lig4-deficient MEFs, a genetic 
setting that is conducive to the activity of alt- NHE) as well as HDR’. To 
investigate the relative contribution of the two repair pathways, we used 
a chromosome-orientation fluorescence in situ hybridization (CO-FISH) 
assay”, and monitored the exchange of telomeres between sister chro- 
matids by HDR (telomere sister chromatid exchange, T-SCE), and, at the 
same time, measured the frequency of chromosome end-end fusion by 
end-joining (Fig. 3c). After depletion of shelterin from Trfl’”* Trf2'”” 
Lig4 ‘~ Cre-ER'* MEFs, ~ 10% of the telomeres were processed by alt- 
NHEJ, whereas ~5% of chromosome ends showed T-SCEs? (Fig. 3c, d 
and Extended Data Fig. 8a—c). As expected, we observed a substantial 
reduction in the frequency of alt-NHEJ at shelterin-free telomeres in 
Lig4‘~ cells that lack Polq or Lig3 (Fig. 3d and Extended Data Fig. 8a-e). 
Remarkably, Polq-depleted cells exhibited a concomitant increase in 
T-SCE, which was not evident in cells lacking Lig3 (Fig. 3d), thereby 
highlighting a unique role for PolO in counteracting HDR. To gain insight 
into this novel Pol® function, we show that the promiscuous polymer- 
ase is not required for end-resection of DSBs (Extended Data Fig. 8f, g). 
Instead, its activity counteracts the accumulation of RAD51 foci (Fig. 3e, f 
and Extended Data Fig. 8h). To corroborate these findings, we used the 
traffic light reporter (TLR) system, designed to generate a flow-cytometric 
readout for HDR and end-joining at a site-specific DNA break induced 
by I-Scel (ref. 22). We observed that after knocking down Polg in Lig /~ 
cells, resolution of the I-Sce1-induced DNA break by HDR is increased, 
in conjunction with a significant reduction in the frequency of alt- NHEJ 
(Extended Data Fig. 9). 

Alt-NHE]J is often considered as a back-up choice for DSB repair, 
operating at the expense of genomic stability. Circumstantial evidence 
suggests that this pathway could be enhanced when HDR is impaired”*. 
We therefore postulated that this error-prone mode of repair has an 
essential role in cells with compromised HDR activity. We tested this 
hypothesis by inhibiting Polq in cells lacking the breast cancer suscepti- 
bility genes—Brcal and Brca2. Chromosome analysis revealed a four- 
fold increase in chromosomal aberrancies after Polq depletion in MEFs 
lacking either Brcal or Brca2. Such aberrancies included chromatid 
and chromosome breaks, in addition to radial chromosome structures 
characteristic of Lig4-mediated processing of chromatid breaks via the 
C-NHE] pathway (Fig. 4a, b and Extended Data Fig. 10a, b). Ultimately, 
the increased genomic instability in cells co-depleted for Polq and Brea 
genes compromised cellular survival. We observed that BRCA 1-mutated 
human cells (Fig. 4c, d), and mouse cells lacking Brca1 (Extended Data 
Fig. 10c-f), displayed significantly reduced colony-forming capabilities 
after Polq impairment. Although we cannot exclude that Pol® performs 
additional activities required for the survival of Brca-deficient cells”, 
our data suggest that Pol0-mediated alt- NHEJ promotes the survival of 
cells with a compromised HDR pathway. In the absence ofa safer means 
to repair breaks, alt-NHEJ may therefore prevent genomic havoc by 
resolving unrepaired lesions. 

Here we provide direct evidence linking Pol to alt-NHEJ repair in 
mammalian cells (Fig. 4e). We also show that while Pol@ hinders error-free 
repair by HDR, its activity is essential for the survival of HDR-deficient 
cells (Fig. 4e). The question remains as to how this promiscuous poly- 
merase orchestrates DSB repair. After DSB formation in the S/G2 phase 
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Figure 4 | Polg inhibition in Brca-mutant cells leads to increased 
chromosomal aberrancies and reduced cellular survival. a, Analysis of 
genomic instability in metaphase spreads from Brea1’”” Cre-ER™ MEFs 
treated with shRNA against Polq or vector control. b, Quantification of breaks 
(chromatid and chromosome) and radials in Brcal*”” Cre-ER'? and Brea2"”” 
Cre-ER™ MEFs with the indicated treatment. Mean values are presented 
with error bars denoting + s.d. from three independent experiments. 

c, Clonogenic survival after Polq depletion. Crystal violet staining of BJ-hTERT, 
MCEF7 and HCC1937 cells treated with shRNA against Polq or vector control. 
d, Quantitative analyses of colony formation assay. Colonies in each control 
shRNA cell line were set to 100%. Colonies in shPolq-expressing cells are 
normalized to shCtrl. Mean values + s.d. derived from three independent 
experiments. *P < 0.05, **P < 0.01; two-tailed Student’s t-test. e, Schematic 
depicting our model for the function of Pol during DSB repair (see 
Supplementary Information). 


of the cell cycle, resection of DSBs by MRE11 and CtIP” potentially 
exposes micro-homology that allows spontaneous annealing of bro- 
ken DNA ends (Fig. 4e). The binding of RPA antagonizes this anneal- 
ing step to promote HDR-mediated repair*®’’. An opposing activity is 
likely to be exerted by Pol0. Placing our finding in the context of recent 
biochemical experiments and genetic studies in model organisms'*""*, 
we ultimately propose a model in which Pol exploits both its template- 
independent and template-dependent activities to stabilize the annealed 
intermediate structure and channel repair towards the alt-NHE) path- 
way (Fig. 4e). Our findings that Pol0 is critical for alt- NHEJ support this 
model, which provides a potential explanation as to how this polymer- 
ase counteracts HDR. 
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Finally, it is intriguing that although POLQ expression in normal 
human tissues is generally repressed”, it is upregulated in a wide range 
of human cancers and associates with poor clinical outcome in breast 
tumours””°. Our findings that cells with compromised HDR activity 
depend on this mutagenic polymerase for survival establish a rationale 
for the development of Pol9-targeted approaches for cancer treatment. 


Online Content Methods, along with any additional Extended Data display items 
and Source Data, are available in the online version of the paper; references unique 
to these sections appear only in the online paper. 
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METHODS 

Cell culture procedures. Trf2‘" Cre-ER”, rfl!” Trf2’ Ku80 /~ Cre-ER'? and 
Trft"”" Trf"” Liga /~ Cre-ER'? MEF lines were previously described”. Polg'’* and 
Polq ‘~ MEFs were a gift from N. Shima’. Brcal’”” Cre-ER™ and Brea2"" Cre- 
ER'? MEFs and U20S-DSB reporter cells'® were a gift from R. Greenberg. Trf1” 
Trf2” Liga ‘~ Cre-ER™ MEEs were derived from mice that were deficient in p53. 
The remaining MEF lines were immortalized with pBabeSV40LargeT. MEFs were 
cultured in DMEM supplemented with 10-15% FBS (Gibco), 2 mM t-glutamine 
(Sigma), 100 U ml! penicillin (Sigma), 0.1 pg ml! streptomycin (Sigma), 0.1 mM 
non-essential amino acids (Invitrogen) and 1 mM sodium pyruvate (Sigma). Expres- 
sion of Cre recombinase was induced by treating MEFs carrying the Cre-ER”” allele 
with 0.5 1M 4-OHT (Sigma H7904) for 12 h. The t = 0 time point was set at the time 
of treatment with 4-OHT. BJ-hTERT and MCF7 cells were grown in DMEM sup- 
plemented with 10% FBS. HCC1937 cells were grown in RPMI medium (Gibco) 
containing 15% FBS. U2OS-DSB reporter cells were grown in DMEM supplemented 
with 10% BCS. Human HeLa cells were grown in DMEM supplemented with 10% 
FBS, 100 U ml" penicillin, 100 xg ml! streptomycin and 2 mM L-glutamine. Mouse 
embryonic stem cells were grown in DMEM supplemented with 15% FBS (ES- 
qualified FBS) (Gibco), 2 mM L-glutamine (Sigma), 100 U ml~ 7 penicillin (Sigma), 
0.1 pg ml ! streptomycin (Sigma), 0.1 mM non-essential amino acids (Invitrogen), 
leukaemia inhibitory factor (LIF) and 2-$-mercaptoethanol (Gibco 21985). For 
inhibitor experiments, PARPi (KU58948, Axon medchem), ATMi (KU-55933, 
Tocris) and ATRi (VE-821, Selleckchem) were all used at a final concentration of 
10 1M, and were applied to culture medium 2-4 h before irradiation. For ionizing 
radiation treatment, cells were exposed to 1-10 Gy ionizing radiation by a Faxitron 
X-ray system (120 kV, 5 mA, dose rate 5Gy min ') and recovered for 4h before 
immunofluorescence analysis. 

Lentiviral delivery of shRNA. shRNA treatments were carried out before 4-OHT 
treatment. shRNAs (see below for a list of sequences) were introduced by two len- 
tiviral infections at 12h intervals using supernatant from transfected 293T cells. 
Parallel infection with pLKO.1 was used as a negative control. Cells were selected 
with puromycin for 3 days. 

TLR assay. Lentiviral constructs coding for TLR (31482) and I-Scel with donor 
e-GFP (31476) were purchased from Addgene”. To avoid the confounding effect 
of classical-NHEJ on the repair of I-Sce1-induced DNA breaks, we stably integrated 
the TLR construct into Ku80/~ and Lig4~/~ MEFs. The plasmid was transduced 
by two lentiviral infections at 12h intervals using supernatant from transfected 
293T cells. Cells with integrated TLR were selected with puromycin for 5 days. Cells 
were then transduced with concentrated Polq shRNA lentiviral particles followed 
by I-Scel. Cells were collected 72h later without further antibiotic selection and 
analysed ona BD LSRIL. eGFP fluorescence, which reflects HDR repair, was mea- 
sured using a 488-nm laser for excitation and a 530/30 filter for detection. mCherry 
fluorescence, indicative of alt-NHEJ was measured by using a 561-nm laser for exci- 
tation and a 610/20 filter for detection. Data were analysed using FloJo software. 
Detection of telomeric fusions. To enrich for telomeric DNA, genomic DNA was 
digested with two frequent cutters (Alul and Mbol) and fragments greater than 
10 kilobases (kb) were isolated. The resulting DNA was used to generate a library 
using the NEBNext Ultra Library Prep Kit and the NEBNext Multiplex Oligos for 
Illumina (NEB) following the manufacturer instructions. The resulting library was 
run on an Illumina HiSeq platform generating 100-base-pair (bp) indexed pair- 
end reads. 

Transient transfection of cells and laser micro-irradiation. Full-length human 
POLQ was cloned into pLPC-Myc vectors. HeLa cells were plated on glass-bot- 
tomed dishes (Willco Wells). Myc-Pol® constructs were transfected into HeLa 
and U2OS-DSB reporter cells cell with HilyMax (Dojindo) according to the man- 
ufacturer’s instruction. Then, 24h after transfection, cells were pre-sensitized with 
10 uM 5-bromo-2'-deoxyuridine (BrdU) in normal DMEM medium for 20h. After 
indicated treatments, cells were damaged by laser micro-irradiation as previously 
described*'. After laser micro-irradiation, cells were incubated for 1 h, then fixed 
and analysed by immunofluorescence and microscopic imaging as described below. 
For PARP] siRNA experiments, cells were transfected with siCtrl (non-targeting 
pool, Thermo Scientific) or siPARP1 (GGGCAAGCACAGUGUCAAAUU, Sigma), 
for 24h before POLQ transfections and subsequent treatments as described above. 
Immunofluorescence and confocal microscopy. After the indicated treatments, 
cells were processed and analysed for immunofluorescence as previously described”. 
In brief, cells were fixed with 2% (v/v) paraformaldehyde for 15 min at room tem- 
perature. Cells were washed with PBS, permeabilized with 0.5% (v/v) Triton X-100 
for 10 min, and blocked with PBS containing 3% BSA. Cells were incubated with 
the same buffer containing primary antibodies for 1h at room temperature fol- 
lowed by secondary antibodies incubations for 1 h at room temp. Cells were imaged 
and analysed with Z-stacked setting using the FV10-ASW3.1 software on a Fluo- 
view 1000 confocal microscope (Olympus). For laser line quantification, >50 cells 
were counted for all conditions from two independent experiments. The primary 


antibodies used for immunofluorescence were ‘y-H2AX (p Ser139) (rabbit poly- 
clonal, Novus, NB100-384) and c-Myc (mouse monoclonal, Santa Cruz, sc-40). The 
secondary antibodies used for immunofluorescence were Alexa Fluor 594 (rabbit) 
(Invitrogen, A11037) and Alexa Fluor 488 (mouse) (Invitrogen, A11029). To ana- 
lyse the recruitment of Pol@ to double-stranded breaks, U2OS-DSB reporter cells 
expressing Myc-PolO were analysed 4h after treatment with shield and tamoxifen. 
Lastly, to analyse RAD51 foci formation and its co-localization with y-H2AX (p 
Ser139) after ionizing radiation treatment, cells were treated with 0.2% Triton X-100 
(in PBS) for 5 min on ice before fixation with paraformaldehyde. The primary anti- 
bodies used for RAD51 immunofluorescence were y-H2AX (p Ser139) (mouse 
monoclonal, Novus, NB100-384) and RAD51 (rabbit polyclonal, Santa Cruz, 
sc-8349). 

FISH. Cells were collected at 96h after treatment with 4-OHT to analyse the fre- 
quency of telomere fusions. In brief, ~80% confluent MEFs were incubated for 2 h 
with 0.2 pg ml” colcemid (Sigma). The cells were collected by trypsinization, resus- 
pended in 0.075 M KClat 37 °C for 30 min, and fixed overnight in methanol/acetic 
acid (3:1) at 4 °C. The cells were dropped onto glass slides and the slides were dried 
overnight. The next day, the slides were rehydrated with PBS for 15 min then fixed 
with 4% formaldehyde for 2 min at room temperature. Slides were digested with 
1 mg ml! pepsin, pH 2.2, at 37 °C for 10 min, washed three times with PBS and 
fixed again in 4% formaldehyde for 2 min at room temperature. After three PBS 
washes, the slides were incubated consecutively with 75%, 95% and 100% ethanol 
and allowed to air dry for 30 min before applying hybridization solutions (70% 
formamide, 1 mg ml! blocking reagent (Roche), 10 mM Tris-HCl, pH 7.2) con- 
taining TAMRA-OO-(TTAGGG), PNA probes (Applied Biosystems). Slides were 
denatured by heating for 3 min at 80 °C and hybridized for 2h at room temper- 
ature. The next day, the slides were washed twice for 15 min each in 70% forma- 
mide, 10 mM Tris-HCl, followed by three 5-min washes in 0.1 M Tris-HCl, pH 7.0, 
0.15 M NaCl and 0.08% Tween-20. Chromosomal DNA was counterstained with 
DAPI during the second PBS wash. Slides were mounted in antifade reagent (Pro- 
Long Gold, Invitrogen) and images were captured with a Nikon Eclipse TI micro- 
scope (see http://delangelab.rockefeller.edu/protocols). 

CO-FISH. Cells were labelled with BrdU:BrdC (3:1, final concentration 10 1M) 
for 14-16 h. Two hours before collection by trypsinization, 0.2 1g ml ' colcemid 
was added to the media. To fix the cells and drop metaphases on a glass slide, the 
same procedure that was applied for FISH was followed. Slides were treated with 
0.5 mg ml! RNase A (in PBS, DNase-free) for 10 min at 37°C, incubated with 
0.5 ug ml’ Hoechst 33258 (Sigma) in 2XSSC for 15 min at room temperature, 
and exposed to 365-nm ultraviolet light (Stratalinker 1800 UV irradiator) for 30 min. 
The slides were then digested twice with 800 U exonuclease III (Promega) at room 
temperature for 10 min each, washed with PBS and dehydrated through an ethanol 
series of 70%, 95% and 100%. After air-drying, slides were hybridized with Tamra- 
OO-(TTAGGG); PNA probe in hybridization solution (70% formamide, 1 mg ml" 
blocking reagent (Roche) and 10 mM Tris-HCl, pH 7.2) for 2h at room temper- 
ature. The slides were then washed for a few seconds with 70% formamide and 
10 mM Tris-HCl, pH 7.2, and incubated with FITC-OO-(CCCTAA); PNA probe 
in hybridization solution for 2 h. Slides were washed and mounted as described for 
FISH (see http://delangelab.rockefeller.edu/protocols). 

Western blot analysis. Cells were collected by trypsinization, lysed in 2 Laemmli 
buffer (100 mM Tris-HCl, pH 6.8, 200 uM dithiothreitol, 3% SDS, 20% glycerol and 
0.05% bromophenol blue) at 1 X 10* cells per microlitre. The lysate was denatured 
for 10 min at 95 °C, and sheared by forcing it through a 28-gauge insulin needle 
ten times. Lysate from 1 X 10° cells was loaded on an SDS-PAGE and transferred 
to a nitrocellulose membrane. The membrane was blocked in 5% milk in TBS with 
0.1% Tween-20 and incubated with primary antibody in TBS, 5% milk and 0.1% 
Tween-20 for 2h at room temperature. The following primary antibodies were 
used: PolO (ab80906, Abcam); TRF1 (1449, rabbit polyclonal); RAP1 (1252, rabbit 
polyclonal); phospho-CHK2 (Thr68) (rabbit polyclonal, Cell Signaling); CHK2 
(rabbit polyclonal, Cell Signaling); phospho-CHK1 (Ser 345) (mouse monoclonal, 
Cell Signaling); CHK1 (mouse monoclonal, Santa Cruz); LIG3 (mouse monoclonal, 
Santa Cruz); Myc (9E10; Calbiochem); and y-tubulin (clone GTU-88, Sigma); PARP1 
(polyclonal, Cell signaling). (See http://delangelab.rockefeller.edu/protocols.) 
Chromosomal aberrancies. Cells were collected and dropped on microscope slides 
as described for the FISH protocol. After the slides had dried overnight, they were re- 
hydrated in PBS, stained with 0.25 pg ml! DAPI, dehydrated in a 70%, 95% and 
100% ethanol series, mounted and imaged using Nikon Eclipse TI microscope. 
Aberrancies were scored as a percentage of chromatid breaks, chromosome breaks, 
and chromosome radial structures compared to total number of chromosomes. 
Chromosomal translocation assay. Induced pluripotent stem cells were derived from 
primary Polq*’* and Polq ’~ MEFs according to standard Yamanaka protocol™. 
To perform the translocation assay, Polq*’* and Polq ’~ induced pluripotent stem 
cells were transfected with 2 1g of Cas9-gRNA (Rosa26;H3f3b) plasmid per million 
cell. We constructed Cas9-gRNA(Rosa26;H3f3b) by introducing two guide RNAs 
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(5'-GITTGGCTCGCCGGATACGGG-3’ for H3f3b; 5'-ACTCCAGTCTTTCTA 
GAAGA-3’ for Rosa26) into pX330 (Addgene, 42230). After transfection, 1 X 10* 
cells were seeded per well in a 96-well plate, and lysed 3 days later in 40 ul lysis 
buffer (10 mM Tris, pH 8.0, 0.45% Nonidet P-40 and 0.45% Tween 20). The lysate 
was incubated with 200 pg ml ' proteinase K for 2 hat 55 °C. Translocation detec- 
tion was performed according to previously established protocol’, using nested 
PCR. The primers used for the first PCR reaction were Tr6-11-Fwd: 5’-GCGGG 
AGAAATGGATATGAA-3’ and Tr6-11-Rev: 5’- TTGACGCCTTCCTTCTTCT 
G-3' for der(6), and Tr11-6-Fwd: 5'’-AACCTTTGAAAAAGCCCACA-3’ and 
Tr11-6-Rev: 5'-GCACGTTTCCGACTTGAGTT-3’, for der(11). For the second 
round of PCR amplification we used the primers Tr6-11NFwd: 5’-GGCGGAT 
CACAAGCAATAAT-3’ and Tr6-11NRev: 5'-CTGCCATTCCAGAGATTGGT- 
3', and Tr11-6NFwd: 5’-AGCCACAGTGCTCACATCAC-3’ and Tr11-6NRev: 
5'TCCCAAAGTCGCTCTGAGTT-3’. The number of PCR-positive wells was used 
to calculate the translocation frequency as previously described’. Amplified pro- 
ducts from positive wells were sequenced to verify translocations and determine 
the junction sequences. 

Surveyor assay. Forty-eight hours after transfection, genomic DNA was extracted 
with GE Healthcare Illustra Genomic Prep Mini Spin Kit (28-9042-76). The geno- 
mic region encompassing the guide RNA target sites was amplified using Q5 High- 
Fidelity DNA polymerase (New England BioLabs) with the primers Rosa26-Fwd: 
5'-TAAAACTCGGGTGAGCATGT-3’ and Rosa26-Rev: 5'-GGAGTTCTCTGC 
TGCCTCCTG-3’, and H3f3b-Fwd: 5’-GCGGCGGCTTGATTGCTCCAG-3’ and 
H3f3b-Rev: 5'-AGCAACTTGTCACTCCTGAGCCAC-3’. PCR fragments were 
gel purified and the surveyor assay was performed using a detection kit (Transge- 
nomic), according to manufacturer’s instructions. Agarose gels (2%) were used to 
visualize the bands after surveyor digestion. 

Colony formation assay. After lentiviral transduction with shCtrl or (sequences 
listed below), cells were selected with puromycin (BJ: 0.5 ug ml” 1 MCE7 and 
HCC1937: 1 pg ml; MEBs: 2 ug ml!) for 72h and plated in 6-cm dishes (1,000 
and 10,000 cells per plate). After 10-14 days, colonies were fixed with 3% parafor- 
maldehyde (5 min), rinsed with PBS, and stained with crystal violet (Sigma-Aldrich). 
CRISPR targeting to mutate Polg gene in mouse embryonic stem cells. To gen- 
erate cells carrying a catalytic dead Pol®, two mutations at residues Asp2494Gly and 
Glu2495Ser (ref. 34) were introduced in the endogenous Polq locus in mouse 
embryonic stem cells using CRISPR/Cas9 gene targeting. Two guides RNAs were 
co-transfected with a Cas9-nickase (pX335-U6-Chimeric_BB-CBh-hSpCas9n(D10A)), 
and a donor cassette that introduces a SaclI restriction site while replacing the two 
amino acid residues. Clonal cells lines were derived and genotyped to determine 
successful targeting. Two independent clonally derived lines were used for the 
analysis of translocation. 

shRNA target sequence (pLKO.1 vector). shPolm: 5’-CTCACCTCTCACACAC 
CATAA-3’; shPolk: 5'-GCCCTTAGAAATGTCTCATAA-3’; shPolh: 5’-GCTC 
GATTCTCCAGCTTACAA-3’; shPoli: 5’-AGTGAAGAAGATACGTTTAAA-3’; 
shPolb: 5’-CCAAAGTTGTTACATCGTGTT-3’; shPoln: 5’-CCTACTCACAT 
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GAAGGACATT-3’; shLig3 (mouse): 5'-CCAGACTTCAAACGTCTCAAA-3’; 
shLIG3 (human): 5’-CCGGATCATGTTCTCAGAAAT-3’; shPolq-1 (mouse): 
5'-CGGCGGAGTATGAGAACTATT-3’; shPolq-2(mouse): 5’-CCAGGAATCA 
AAGACGACAAT-3’; shPolq-3(mouse): 5'-CCTGGCTGAATGCTGAACTTT-3’; 
shPOLQ (human): 5’-CGGGCCTCTTTAGATATAAAT-3’; shBrcal (mouse): 5’-G 
CTCAGTGTATGACTCAGTTT-3’. 

Primers for quantitative PCR. BRCA1Fwd: 5’-CTGCCGTCCAAATTCAAGA 
AGT-3' and BRCAI1Rev: 5’-CTTGTGCTTCCCTGTAGGCT-3'; BRCA2Fwd: 5'-T 
GTGGTAGATGTTGCTAGTCCGCC-3’ and BRCA2Rev: 5'-GCTTTTCTCGT 
TGTAGTACTGCC-3’; POLbFwd: 5’-TGAACCATCATCAACGAATTGGG-3' 
and POLbRev: 5’-CCATGTCTCCACTCGACTCTG-3’; POLmFwd: 5’-AGGCT 
TCCGCGTCCTAGAT-3’ and POLmRev: 5’-GTGGGGAGAGCATCCATGTT-3’; 
POLkFwd: 5'-AGCTCAAATTACCAGCCAGCA-3’ and POLkRev: 5'-GGTTG 
TCCCTCATTTCCACAG-3’; POLhFwd: 5'-ATCGAGTGGTTGCTCTTGTAG 
A-3' and POLhRev: 5’-CCAAATGCTCGGGCTTCATAG-3’; POLiFwd: 5’-GC 
AGTCAAGGGCCACCTAC-3’ and POLiRev: 5’-AGGTCTGTCCTTTAATTCT 
GGGT-3’; POLnFwd: 5'-AGCTGATGGATGCTCTCAAGCAGG-3’ and POLnRev: 
5'-GAGTCAGAGTGCTGTTGCCTACATGG-3’; LIG3(mouse)Fwd: 5’-GAAG 
AAAGCTGCTGTCCAGG-3’ and LIG3(mouse)Rev: 5'-CAGAGTTGTTGGGTT 
TTGCTG-3’; LIG3(human)Fwd: 5’-GAAGAGCTGGAAGATAATGAGAAGG-3' 
and LIG3(human)Rev: 5’-AGTGGTTGTCAACTTAGCCTGG-3’; POLQ(mouse) Fwd: 
5'-CAAGGTTTCATTCGGGTCTTGG-3’ and POLQ (mouse)Rev: 5’-CGAGC 
AGGAAGATTCACTCCAG-3’; POLQ(human) Fwd: 5’-CAGCCCTTATAGTG 
GAAGAAGC-3' and POLQ(human) Rev: 5’-GCACATGGATTCCATTGCAC 
TC-3’. 

Statistical analysis. Results are presented as mean = s.d. of two or three indepen- 
dent experiments unless otherwise stated. P< 0.05 was considered statistically signi- 
ficant, and P values were calculated using the two-tailed Student’s t-test. No statistical 
methods were used to predetermine sample size. 
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Extended Data Figure 1 | Pol@ promotes alt-NHEJ repair at dysfunctional _ protein the stability of which depends on TRF2 (refs 35, 36). b, To validate the 
telomeres. (Related to Fig. 1.) a, Immunoblots for TRF1 and RAP!1 after effect of Polg depletion on alt-NHEJ we monitored the frequency of telomere 
4-OHT-induced depletion of TRF2 from Trf2'”” Cre-ER™? MEFs and fusions in shelterin-free Ku80-null cells treated with three independent shPolq 
co-depletion of TRF1 and TRF2 from Trfl*” Trf2"” Ku80-‘~ Cre-ER™ cells. vectors. shPolq-1 was used in Fig. 2. Mean values are presented with error 
Loss of TRE2 is confirmed by the disappearance of RAP1; a TRF2-interacting bars denoting + s.e.m. from two independent experiments. 
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Extended Data Figure 2 | Pol® drives chromosomal translocations in mouse 
cells. (Related to Fig. 2.) a, Immunobloting for Pol@ in MEFs with the 
indicated genotype and treatment. b, Immunoblot for TRF1 in MEFs with the 
indicated genotype. Cells were analysed 96h after Cre induction. c, RAP1 
immunoblot (similar to b). d, Western blot analysis for Pol0 and Flag—Cas9 
in lysates prepared from Polq ’~ and Polq*’* cells after Cas9 expression. 
Tubulin serves as a loading control. e, Surveyor nuclease assay for Polq ’~ and 
Polq*’* cells expressing Cas9-gRNA(Rosa26;H3f3b) plasmid. Genomic DNA 
isolated from cells with the indicated genotype was used as a template to 
amplify across the cleavage site at either the Rosa26 or the H3f3b locus to assess 
intra-chromosomal NHEJ. Amplification products were denatured and then 
re-annealed to form heteroduplexes between unmodified and modified 
sequences from imprecise NHEJ. The mismatched duplex was selectively 
cleaved by the Surveyor nuclease at the loops that form at mismatches. 

f, Signature of translocations in Polq ’ and Polq*’* cells (see Extended Data 
Figs 3-5 for complete list of sequences). Table records the total number of 


i + Insertions — Insertions 


translocation events identified following CRISPR-Cas9 induced-cleavage. 

On average, the same number of nucleotides was deleted at the fusion junction 
in Polq”’ and Polq*’* cells. No nucleotide insertions were found in the 
absence of Polgq. Lastly, the percentage of junctions exhibiting microhomology 
was significantly reduced in cells lacking Polq. g, Scheme depicting Pol@ 
domains. CRISPR/Cas9 gene targeting was used to create two base substitutions 
at Asp2494Gly and Glu2495Ser, and generate a catalytic-dead polymerase”. 
h, Sequence analysis of targeted cells. i, Genotyping PCRs of Polq*’* and 
Pol”? (catalytically dead allele of Polq) after SaclI digestion. j, Immunoblottin 
to analyse Cas9 expression in Polq*’" and two independently derived Polq® 
clonal cell lines. k, Frequency of chromosomal translocations (der(6)) in 
Polq*’* and Polq°” cells. Bars represent mean of four independent 
experiments + s.d. (two experiments per clonal cell line). **P = 0.006; two- 
tailed Student’s t-test. PCR products were sequenced to confirm translocation 
and identify possible insertions. 
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Translocation Junction Sequence of Polq+* cells 


Der(11) 
Chr-11 Y Chr-6 
GCCATAAAAACCGCTTCAACT TAAGCTCTCTCCCCCCGTATCCGGCGAGCC — TCTAGAAAGACTGGAGTTGCAGATCACGAGGGAAGAGGGGGAAGGGATTC1 
TGCCATAAAAACCGCTTCAACT TAAGCTCTCTCCCCCC AATATATACAT TATACATATATCTGCGCCAGGCGGG 
TGCCATAAAAACCGCTTCAACT TAAGCTCTCTCCCCCCGTATCCGGCG GAAAGACTGGAGTTGCAGATCACGAGGGAAGAGGGGGAAGGGATTC1 
TACTCC 
TGCCATAAAAACCGCTTCAACTTAAGCTCTCTCCCCCCGTATCCGGCG CCCTGCTTGCGACTCTCCTC TIGCAGATCACGAGGGAAGAGGGGGAAGGGATTC] 
TGCCATAAAAACCGCTTC AACTTAAGCTCTCTCCCCCC] TAGAAAGACTGGAGTTGCAGATCACGAGGGAAGAGGGGGAAGGGATTC1 
TCTGGTGACTGCAGGCCAG AGAAAGACTGGAGTTGCAGATCACGAGGGAAGAGGGGGAAGGGATTC1 
TGCCATAAAAACCGCTTCAACT TAAGCTCTCTCCCCCCGTATCC AGAAAGACTGGAGTTGCAGATCACGAGGGAAGAGGGGGAAGGGATTC1 
(Ant) 
TGCCATAAAAACCGCTTC AACTTAAGCTCTCTCCCCCCGTATCCGGCGIAG] ACTGGAGTTGCAGATCACGAGGGAAGAGGGGGAAGGGATTC1 
CTGACCTGCAGTCACCAG TAGAAAGACTGGAGTTGCAGATCACGAGGGAAGAGGGGGAAGGGATITC] 


TGCCATAAAAACCGCT TCAACT TAAGCTCTCTCCCCCCGTATCCGGCGAGCC AGAAAGACTGGAGTTGCAGATCACGAGGGAAGAGGGGGAAGGGATTC1 
TGCCATAAAAACCGCT TCAACT TAAGCTCTCTCCCCCCGTATCCGGCGAG ACTGG AGAAAGACTGGAGTTGCAGATCACGAGGGAAGAGGGGGAAGGGATTC1 
TGCCATAAAAACCGCTTCAACT TAAGCTCTCTCCCCCCGTATCCGGCGAGCC §=TCTAGAAAGACTGGAGTTGCAGATCACGAGGGAAGAGGGGGAAGGGATTC1 


\TGCCATAAAAACCGCTTCAACTTAAGCTCTCTCCCCCCGTATCC AGACTGGAGTTGCAGATCACGAGGGAAGAGGGGGAAGGGATTC 
\TGCCATAAAAACCGCTTCAACTTAAGCTCTCTCCCCCCGTATCCGGCGAGCC (66nt) AAAGACTGGAGTTGCAGATCACGAGGGAAGAGGGGGAAGGGATTC 
\TGCCATAAAAACCGCT TCAACTTAAGCTCTCTCCCCCCGTAT ACTGGAGTTGCAGATCACGAGGGAAGAGGGGGAAGGGATTC 
\TGCCATAAAAACCGCTTCAACTTAAGCTCTCTCCCCCCGTATCCGGCGAGC = (136nt) AAAGACTGGAGTTGCAGATCACGAGGGAAGAGGGGGAAGGGATTC 
ATGCCATAAAAACCGCTTCAACTTAAGCTCTCTCCC (282nt) GAAAGACTGGAGTTGCAGATCACGAGGGAAGAGGGGGAAGGGATTC, 
ATGCCATAAAAACCGCTTCAACTTAAGCTCTCTCCCCCCGTATCCGGCGAG)§= GAGGCGCTCCCAGGTTCCGGCCCTCCCCTCGGCCCCGCGCCGCAGATAT 
ATGCCATAAAAACCGCTTCAACTTAAGCTCTCTCCCCCCGTATCCGGCG [AGAAAGACTGGAGTTGCAGATCACGAGGGAAGAGGGGGAAGGGATTC 
ATGCCATAAAAACCGCTTCAACTTAAGCTCTCTCCCCCCGTATCCGGCG [AG ACTGGAGTTGCAGATCACGAGGGAAGAGGGGGAAGGGATTC 
ATGCCATAAAAACCGCTTCAACTTAAGCTCTCTCCCCCCGTATCCGGCGA] CTGGAGTTGCAGATCACGAGGGAAGAGGGGGAAGGGATTC 
ATGCCATAAAAACCGCTTCAACTTAAGCTCTCTCCCCCCGTATCCGGCG [AGACTGGAGTTGCAGATCACGAGGGAAGAGGGGGAAGGGATTC 
ATGCCATAAAAACCGCTTCAACTTAAGCTCTCTCCCCCCGTATCCG] AAAGACTGGAGTTGCAGATCACGAGGGAAGAGGGGGAAGGGATTC 
ATGCCATAAAAACCGCTTCAACTTAAGCTCTCTCCCCCCG] GAAAGACTGGAGTTGCAGATCACGAGGGAAGAGGGGGAAGGGATTC 
ATGCCATAAAAACCGCTTCAACTTAAGCTCTCTCCCCCCGTATCCGGCG IAGACTGGAGTTGCAGATCACGAGGGAAGAGGGGGAAGGGATTC 
ATGCCATAAAAACCGCTTCAACTTAAGCTCTCTCCCCCCGTATCCGGCGAG AGACTGGAGTTGCAGATCACGAGGGAAGAGGGGGAAGGGATTC 
ATGCCATAAAAACCGCTTCAACTTAAGCTCTCTCCCCCCGTATCCGGCG TCTAGAAAGACTGGAGTTGCAGATCACGAGGGAAGAGGGGGAAGGGATTC 
ATGCCATAAAAACCGCTTCAACTTAAGCTCTCTCCCCCCGTATCCGG AAGGGAA GAGGGAAGAGGGGGAAGGGATTC 
ATGCCATAAAAACCGCTTCAACTTAAGCTCTCTCCCCCCGTATCC GIACTGGAGTTGCAGATCACGAGGGAAGAGGGGGAAGGGATTC 
ATGCCATAAAAACCGCTTCAACTTAAGCTCTCTCCCCCCGTATCCGGCGA] GAAAGACTGGAGTTGCAGATCACGAGGGAAGAGGGGGAAGGGATTC 
ATGCCATAAAAACCGCTTCAACTTAAGCTCTCTCCCCCCGTATCC TCTAGAAAGACTGGAGTTGCAGATCACGAGGGAAGAGGGGGAAGGGATTC 
(2nt) TGGAGTTGCAGATCACGAGGGAAGAGGGGGAAGGGATTC 
ATGCCATAAAAACCGCTTCAACTTAAGCTCTCTCCCCCCGTATC G GACTGGAGTTGCAGATCACGAGGGAAGAGGGGGAAGGGATTC 
ATGCCATAAAAACCGCTTCAACTTAAGCTCTCTC TCTCTCTTCCG GAAAGACTGGAGTTGCAGATCACGAGGGAAGAGGGGGAAGGGATTC 
ATGCCATAAAAACCGCTTCAACTTAAGCTCTCTCCCCCCGTATCCGGCGA] GTTGCAGATCACGAGGGAAGAGGGGGAAGGGATTC 
ATGCCATAAAAACCGCTTCAACTTAAGCTCTCTCCCCCCGTATCCGGCGAGC = (24nt) GACTGGAGTTGCAGATCACGAGGGAAGAGGGGGAAGGGATTC 
ATGCCATAAAAACCGCTTCAACTTAAGCTCTCTCCCCCCGTATCCGGCGAGCC AAGACTGGAGTTGCAGATCACGAGGGAAGAGGGGGAAGGGATTC 
ATGCCATAAAAACCGCTTCAACTTAAGCTCTCTCCCCCCGTATCCG GIACTGGAGTTGCAGATCACGAGGGAAGAGGGGGAAGGGATTC 
ATGCCATAAAAACCGCTTCAACTTAAGCTCTCTCCCCCCGTATCCGGCGAG] AAAGACTGGAGTTGCAGATCACGAGGGAAGAGGGGGAAGGGATTC 
ATGCCATAAAAACCGCTTCAACTTAAGCTCTCTCCCCCCGTATCCGGCG ~=— GTA GTTGCAGATCACGAGGGAAGAGGGGGAAGGGATTC 
ATGCCATAAAAACCGCTTCAACTTAAGCTCTCTCCCCCCGTATCCGGCGAI AGAGGGGGAAGGGATTC 
ATGCCATAAAAACCGCTTCAACTTAAGCTCTCTCCCCCCGTATCCGGCGA| AAGACTGGAGTTGCAGATCACGAGGGAAGAGGGGGAAGGGATTC 
ATGCCATAAAAACCGCTTCAACTTAAGCTCTCTCCCCCCGTATCCGGCGA] AAGACTGGAGTTGCAGATCACGAGGGAAGAGGGGGAAGGGATTC 
ATGCCATAAAAACCGCTTCAACTTAAGCTCTCTCCCCCCGTATCCG] AGGGAAGAGGGGGAAGGGATTC 
ATGCCATAAAAACCGCTTCAACTTAAGCTCTCTCCCCCCGTATCCGGC AAGACTGGAGTTGCAGATCACGAGGGAAGAGGGGGAAGGGATTC 
ATGCCATAAAAACCGCTTCAACTTAAGCTCTCTCCCCCCGTATCCGGC AAGACTGGAGTTGCAGATCACGAGGGAAGAGGGGGAAGGGATTC 
ATGCCATAAAAACCGCTTCAACTTAAGCTCTCTCCCCCCGTATCCGGCGA ~=TGAAA GAAAGACTGGAGTTGCAGATCACGAGGGAAGAGGGGGAAGGGATTC 
ATGCCATAAAAACCGCTTCAACTTAAGCTCTCTCCCCCCGTATCCGGCGAG [CITGGAGTTGCAGATCACGAGGGAAGAGGGGGAAGGGATTC 
ATGCCATAAAAACCGCTTCAACTTAAGCTCTCTCCCCCCGTATCC AGACTGGAGTTGCAGATCACGAGGGAAGAGGGGGAAGGGATTC 
Extended Data Figure 3 | Sequence analysis of translocation junctions in where insertions extended beyond the sequence included in the lane, the 
Polq*’* cells. (Related to Fig. 2.) Sequences of der(11) breakpoint junction length of the insertion was noted in parenthesis (red). Gaps in the sequence 
from Polg*’* cells. Predicted fusion breakpoint based on CRISPR cutting represent nucleotide deletions. The average length of the deletions was noted in 
indicated by an arrow. Reference sequence highlighted at the top. The Extended Data Fig. 2f. Micro-homology is denoted by blue boxes. Micro- 
remaining lines represent individual translocations recovered by PCR and homology embedded in DNA extending beyond the enclosed sequence was 


subject to Sanger sequencing. Nucleotide insertions are marked in red. In cases _ noted in parentheses (blue). 
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Translocation Junction Sequence of Polqt’* cells 


Der(6) 
Chr-6 Y Chr-11 


\CGCCCACACACCAGGTTAGCCTT TAAGCCTGCCCAGAAGACTCCCGCCCATCT_AACTGGATGTCT TTGGGCATGATGGTGACTCTCT TGGCGTGGATGGCACACAGATTC 
\CGCCCACACACCAGGTTAGCCTTTAAGCCTGCCCAGAAGACTCCCGCCCATCT AACTGGATGTCTTTGGGCATGATGGTGACTCTCTTGGCGTGGATGGCACACAGATTC 
\CGCCCACACACCAGGTTAGCCTTTAAGCCTGCCCAGAAGACTCCCGCCCATCT AACTGGATGTCTTTGGGCATGATGGTGACTCTCTTGGCGTGGATGGCACACAGATTC 
\CGCCCACACACCAGGTTAGCCTTTAAGCCTGCCCAGA [ATGTCTTTGGGCATGATGGTGACTCTCTTGGCGTGGATGGCACACAGATTC 
\CGCCCACACACCAGGTTAGCCTTTAAG T CTGGATGTCTTTGGGCATGATGGTGACTCTCT TGGCGTGGATGGCACACAGATTC 
\CGCCCACACACCAGGTTAGCCTT TAAGCCTGCCCAG GGCATGATGGTGACTCTCTTGGCGTGGATGGCACACAGATTC 
\CGCCCACACACCAGGTTAGCCTT TAAGCCTGCCCAGAAGACTCCCGCCCATTT AACTGGATGTCTTTGGGCATGATGGTGACTCTCT TGGCGTGGATGGCACACAGATTC 
\CGCCCACACACCAGGTTAGCCTTTAAGCCTGCCCAGAAGACTCCCGCC {CIGGATGTCTTTGGGCATGATGGTGACTCTCTTGGCGTGGATGGCACACAGATTC 
\CGCCCACACACCAGGTTAGCCTTTAAGCCTGCCCAGAAGACTCCCGCC (CIGGATGTCTTTGGGCATGATGGTGACTCTCTTGGCGTGGATGGCACACAGATTC 
\CGCCCACACACCAGGTTAGCCTT TAAGCCTGCCCAGAAGACTCCCGCC (CIGGATGTCTTTGGGCATGATGGTGACTCTCTTGGCGTGGATGGCACACAGATTC 
\CGCCCACACACCAGGTTAGCCTT TAAGCCTGCCCAGAAGACTCCCGCC (CATE ATGGTGACTCTCTTGGCGTGGATGGCACACAGATTC 
\CGCCCACACACCAGGTTAGCCTTTAAGCCTGCCCAGAA GATGTCTTTGGGCATGATGGTGACTCTCTTGGCGTGGATGGCACACAGATTC 
\CGCCCACACACCAGGTTAGCCTT TAAGCCTGCCCAGAA (GATGTCTTTGGGCATGATGGTGACTCTCTTGGCGTGGATGGCACACAGATTC 
\CGCCCACACACCAGGTTAGCCTT TAAGCCTGCCCAGAAGAC {TGGGCATGATGGTGACTCTCTTGGCGTGGATGGCACACAGATTC 
\CGCCCACACACCAGGTTAGCCTT TAAGCCTGCCCAGAAGAC GGCATGATGGTGACTCTCT TGGCGTGGATGGCACACAGATTC 


\CGCCCACACACCAGGTTAGCCTT TAAGCCTGCCCAGAAGAC GGCATGATGGTGACTCTCTTGGCGTGGATGGCACACAGATIC 
feks ATGTCTT TGGGCATGATGGTGACTCTCTTGGCGTGGATGGCACACAGATT¢ 


GATGTCTT TGGGCATGATGGTGACTCTCT TGGCGTGGATGGCACACAGAT T 
iGATGTCTT TGGGCATGATGGTGACTCTCTTGGCGTGGATGGCACACAGATT¢ 


ACTGGATGTCTTTGGGCATGATGGTGACTCTCT TGGCGTGGATGGCACACAGATT¢ 
GGGCATGATGGTGACTCTCTTGGCGTGGATGGCACACAGATT¢ 
[GTGGATGTCTTTGGGCATGATGGTGACTCTCTTGGCGTGGATGGCACACAGAT TI 
CTGGATGTCTTTGGGCATGATGGTGACTCTCTTGGCGTGGATGGCACACAGATT( 
NACGCCCACACACCAGGTTAGCCTT TAAGCCTGCCCAGAAGACTCCCGCCCATCT = AGATG GGTGACTCTCTTGGCGTGGATGGCACACAGATT¢ 
NACGCCCACACACCAGGTTAGCCTT TAAGCCTGCCCAGAAGACTCCCGCCCATCT GGATGTCTTTGGGCATGATGGTGACTCTCT TGGCGTGGATGGCACACAGATT 
ACGCCCACACACCAGGTTAGCCTTTAAGCCTGCCCAGAAG [ACTGGATGTCTTTGGGCATGATGGTGACTCTCTTGGCGTGGATGGCACACAGATT: 
ACGCCCACACACCAGGTTAGCCTT TAAGCCTGCCCAGAAGACTCCCGCCCAT [CIGGATGTCTTTGGGCATGATGGTGACTCTCTTGGCGTGGATGGCACACAGATT 
ACGCCCACACACCAGGTTAGCCTTTAAGCCTGCCCAGAAGACTCCCGCC [CATGATGGTGACTCTCTTGGCGTGGATGGCACACAGATT 
ACGCCCACACACCAGGTTAGCCT TTAAGCCTGCCCAGAAGACTCCCGCC [CATGATGGTGACTCTCTTGGCGTGGATGGCACACAGATT: 
ACGCCCACACACCAGGTTAGCCT TTAAGCCTGCCCAGAAGACTCCCGCC [CATGATGGTGACTCTCTTGGCGTGGATGGCACACAGATT: 
ACGCCCACACACCAGGTTAGCCTTTAAGCCTGCCCAGAAG [ACTIGGATGTCTTTGGGCATGATGGTGACTCTCTTGGCGTGGATGGCACACAGATT 
ACGCCCACACACCAGGTTAGCCTTTAAGCCTGCCCAGAAGACTCCCGCCCATCT AACTGGATGTCTT TGGGCATGATGGTGACTCTCTTGGCGTGGATGGCACACAGATT 
ACGCCCACACACCAGGTTAGCCTTTAAGCCTGCCCAGAAGACTCCCGCCCATCT ATGTCTT CTTTGGGCATGATGGTGACTCTCTTGGCGTGGATGGCACACAGATT 
ACGCCCACACACCAGGTTAGCCTTTAAGCCTGCCCAGAAGACTCCCGCCCAT ICTGGATGTCTTTGGGCATGATGGTGACTCTCTTGGCGTGGATGGCACACAGATT 
ACGCCCACACACCAGGTTAGCCTTTAAGCCTGCCCAGAAGACTCCCGCCCATCT AACTGGATGTCTTTGGGCATGATGGTGACTCTCTTGGCGTGGATGGCACACAGATT 
ACGCCCACACACCAGGTTAGCCTTTAAGCCTGCCCAGAAGACTCCCGCCCAT [CIGGATGTCTTTGGGCATGATGGTGACTCTCTTGGCGTGGATGGCACACAGATT 
ACGCCCACACACCAGGTTAGCCTTTAAGCCTGCCCAGAAGACTCCCGCCCATI GATGGTGACTCTCTTGGCGTGGATGGCACACAGATT 
ACGCCCACACACCAGGTTAGCCTTTAAGCCTGCCCAGAAGACTCCCGCdG TGGATGTCTTTIGGGCATGATGGTGACTCTCTTGGCGTGGATGGCACACAGATT: 
ACGCCCACACACCAGGTTAGCCTTTAAGCCTGCCCAGAAGACTCCCGCCCAT CTGGATGTCTT TGGGCATGATGGTGACTCTCT TGGCGTGGATGGCACACAGATT: 
iGATGTCTT TGGGCATGATGGTGACTCTCT TGGCGTGGATGGCACACAGATT 
[ACTGGATGTCTTTGGGCATGATGGTGACTCTCTTGGCGTGGATGGCACACAGATT 
[ACTIGGATGTCTTTGGGCATGATGGTGACTCTCTTGGCGTGGATGGCACACAGATT 
ACGCCCACACACCAGGTTAGCCTTTAAGCCTGCCCAGAAGACTCCCGCCC TTTTGTCCCTTTACCTGAAGGGCACCGATGGCTGCACTTTGAAACCTCAAGTCC 
ACGCCCACACACCAGGTTAGCCTTTAAGCCTGCCCAGAAGACTCCCGCCCATCTGAACTGGATGTCTTTGGGCATGATGGTGACTCTCTTGGCGTGGATGGCACACAGATT 
ACGCCCACACACCAGGTTAGCCTT TAAGCCTGCCCAGAAGACTCCCGCCCATCTCAT CTGGATGTCTTTGGGCATGATGGTGACTCTCT TGGCGTGGATGGCACACAGATT 
ACGCCCACACACCAGGTTAGCCTT TAAGCCTGCCCAGAA GICTTTGGGCATGATGGTGACTCTCTTGGCGTGGATGGCACACAGATT 
ACGCCCACACACCAGGTTAGCCTT TAAGCCTGCCCAGAAGACTCCCGCCCATT TC AACTGGATGTCTTTGGGCATGATGGTGACTCTCT TGGCGTGGATGGCACACAGATT 


NACGCCCACACACCAGGTTAGCCTTTAAGCCTGCCCAGAAGACTCCCGCCCAl 
ANACGCCCACACACCAGGTTAGCCTT TAAGCCTGCCCAGAAGACTC 
NACGCCCACACACCAGGTTAGCCTT TAAGCCTGCCCAGAAGACTCCCGCCCAT 


LETTER 


ACGCCCACACACCAGGTTAGCCTT TAAGCCTGCCCAGAAGACTCCCGCCCATCT] 


ACGCCCACACACCAGGTTAGCCTTTAAGCCTGCCCAGAAGACTCCCG) 
ACGCCCACACACCAGGTTAGCCTT TAAGCCTGCCCAGAAGACTCCC 
ACGCCCACACACCAGGTTAGCCTT TAAGCCTGCCCAGAAGA CIT 
ACGCCCACACACCAGGTTAGCCTT TAAGCCTGCCCAGAAGACTCCCGC 
ACGCCCACACACCAGGTTAGCCTTTAAGCCTGCCCAGAA] 


ACGCCCACACACCAGGTTAGCCTT TAAGCCTGCCCAGAAGACTCCCGCCCATCTI 


ACGCCCACACACCAGGTTAGCCTTTA 
AACGCCCACACACCAGGTTAGCCTTTAAGCCTGCCCAGAAG 


AACGCCCACACACCAGGTTAGCCTT TAAGCCTGCCCAGAAGACTCCCGCCCAT 
ACGCCCACACACCAGGTTAGCCTT TAAGCCTGCCCAGAAGACTCCCGCCCAT 
ACGCCCACACACCAGGTTAGCCTT TAAGCC 
ACGCCCACACACCAGGTTAGCCTT TAAGC 
ACGCCCACACACCAGGTTAGCCTT TAAGCCTGCCCAGAAGACTCCCGCCC 
ACGCCCACACACCAGGTTAGCCTT TAAGCCTGCCCAGAAGACTCCCGCCC 
ACGCCCACACACCAGGTTAGCCTT TAAGCCTGCCCAGAAGACTCCCGCCC 
ACGCCCACACACCAGGTTAGCCTT TAAGCC 
ACGCCCACACACCAGGTTAGCCTT TAAGCCTGCCCAGAAGACTCCCGCCCAT 
AACGCCCACACACCAGGTTAGCCTT TAAGCCTGCCCAGAAGACTCCCGCCCAT 


AACGCCCACACACCAGGTTAGCCTT TAAGCC 
AACGCCCACACACCAGGTTAGCCTT TAAGCCTGCCCAGAAGACTCCCG 
ACGCCCACACACCAGGTTAGCCTT TAAGCCTGCCCAGAAGACTCCCGCCCA 
ACGCCCACACACCAGGTTAGCCTT TAAGCCTGCCCAGAAG] 
ACGCCCACACACCAGGTTAGCCTTTAAGCCTGCCCAGAAGACTCCCGCCC C 
TCATCCA 
ACGCCCACACACCAGGTTAGCCTTTAAGCCTGCCCAGAAGACTCCCGCCCATICT) 
ACGCCCACACACCAGGTTAGCCTT TAAGCCTGCCCAGAAGACTCC 


Extended Data Figure 4 | Sequence analysis of translocation junctions in 
Polq +/+ cells. (Related to Fig. 2.) Sequences of der(6) breakpoint junction from 
Polq'’* cells. Predicted fusion breakpoint based on CRISPR cutting indicated 


GGATGTCTTTGGGCATGATGGTGACTCTCT TGGCGTGGATGGCACACAGATT 
GATGTCTTTGGGCATGATGGTGACTCTCT TGGCGTGGATGGCACACAGATT 
GEATGTCTTTGGGCATGATGGTGACTCTCTTGGCGTGGATGGCACACAGATT 
ATGTCTTTGGGCATGATGGTGACTCTCT TGGCGTGGATGGCACACAGATT 


A AACTGGATGTCTTTGGGCATGATGGTGACTCTCTTGGCGTGGATGGCACACAGATT 


ACTGGATGTCTTTGGGCATGATGGTGACTCTCTTGGCGTGGATGGCACACAGATT 
GGATGTCTTTGGGCATGATGGTGACTCTCT TGGCGTGGATGGCACACAGATT 
[ACTGGATGTCTTTGGGCATGATGGTGACTCTCTTGGCGTGGATGGCACACAGATT 
CTGGATGTCTTTGGGCATGATGGTGACTCTCT TGGCGTGGATGGCACACAGATT 
(ACTSGATGTCTTTGGGCATGATGGTGACTCTCTTGGCGTGGATGGCACACAGATT 
AACGCCCACACACCAGGTTAGCCTT TAAGCCTGCCCAGAAGACTCCCGCCCATCTGAACTGGATGTCTTTGGGCATGATGGTGACTCTCT TGGCGTGGATGGCACACAGATT 
GATGTCTT TGGGCATGATGGTGACTCTCT TGGCGTGGATGGCACACAGATT 
GATGTCTTTGGGCATGATGGTGACTCTCTTGGCGTGGATGGCACACAGATT 
[TGATGGTGACTCTCTTGGCGTGGATGGCACACAGATT 
CIGGATGTCTTTGGGCATGATGGTGACTCTCTTGGCGTGGATGGCACACAGATT 
TGGGCATGATGGTGACTCTCTTGGCGTGGATGGCACACAGATT 
[GGGCATGATGGTGACTCTCT TGGCGTGGATGGCACACAGATT 
[AACTGGATGTCTTTGGGCATGATGGTGACTCTCTTGGCGTGGATGGCACACAGATT 
ACTGGATGTCTTTGGGCATGATGGTGACTCTCTTGGCGTGGATGGCACACAGATT 
GATGTCTT TGGGCATGATGGTGACTCTCT TGGCGTGGATGGCACACAGATT 
GATGTCTTTGGGCATGATGGTGACTCTCT TGGCGTGGATGGCACACAGATT 
AACGCCCACACACCAGGTTAGCCTT TAAGCCTGCCCAGAAGACTCCCGCCCATCT ~=ACTGGATGTCTTTGGGCATGATGGTGACTCTCT TGGCGTGGATGGCACACAGATT 
CTGGATGTCTTTGGGCATGATGGTGACTCTCTTGGCGTGGATGGCACACAGATT 
TGGGCATGATGGTGACTCTCTTGGCGTGGATGGCACACAGATT 
[TCTITTGGGCATGATGGTGACTCTCTTGGCGTGGATGGCACACAGATT 


[AACTGGATGTCTT1 
[AACTGGATGTCTT1 


[CIGGATGTCTTT 


GATGTCTT1 


TGGGCATGATGGTGACTCTCT TGGCGTGGATGGCACACAGATT 
AACTGGATGTCTTTGGGCATGATGGTGACTCTCT TGGCGTGGATGGCACACAGATT 
GATGTCTTTGGGCATGATGGTGACTCTCTTGGCGTGGATGGCACACAGATT 
GGATGTCTTTGGGCATGATGGTGACTCTCTTGGCGTGGATGGCACACAGATT 
GGATGTCTTTGGGCATGATGGTGACTCTCT TGGCGTGGATGGCACACAGATT 


by an arrow. Reference sequence highlighted at the top. The remaining lines 
represent individual translocations recovered by PCR and subject to Sanger 


sequencing. 
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LETTER 


Translocation Junction Sequence of Polq~” cells 


Der(11) 
Chr-11 


Chr-6 


TGCCATAAAAACCGCT TCAACTTAAGCTCTCTCCCCCCGTATCCGGCGAGCC _ TCTAGAAAGACTGGAGT TGCAGATCACGAGGGAAGAGGGGGAAGGGATTCTi 


TGCCATAAAAACCGCTTCAACTTAAGCTCTCTCCCCCCGT] 
TGCCATAAAAACCGCTTCAACT TAAGCTCTCTCCCCCCGTATCCGGCGAGCC 
TGCCATAAAAACCGCTTCAACT TAAGCTCTCTCCCCCCGTATCCGGCGAGCC 
TGCCATAAAAACCGCTTCAACTTAAGCTCTCTCCCCCCGTATCCGGC 
TGCCATAAAAACCGCT TCAACT TAAGCTCTCTCCCCCCGTATCCGGCGAGCC 
TGCCATAAAAACCGCTTCAACTTAAGCT 
TGCCATAAAAACCGCTTCAACTTAAGCTCTCTCCCCCC 
TGCCATAAAAACCGCTTCAACT TAAGCTCTCTCCCCCCGTAT 
TGCCATAAAAACCGCTTCAACTTAAGCTCTCTCCCCCCGTATCCG| 
TGCCATAAAAACCGCTTCAACTTAAGCTCTCTCCCCCCGTATCCG| 
TGCCATAAAAACCGCTTCAACTTAAGCTCTCTCCCCCC 
TGCCATAAAAACCGCTTCAACTTAAGCTCTCTCCCCCC 
TGCCATAAAAACCGCTTCAACTTIAAGCTCTCTCCCCCC 
TGCCATAAAAACCGCTTCAACTTAAGCTCTCTCCCCCCGTATCCGGC 
TGCCATAAAAACCGCTTCAACTTIAAGCTCTCTCCCCCC 
TGCCATAAAAACCGCTTCAACTTAAGCTCTCTCCCCCCGTATCCGGCGAG 
TGCCATAAAAACCGCTTCAACTT 
TGCCATAAAAACCGCTTCAACTTIAAGCTCTCTCCCCC 
TGCCATAAAAACCGCTTCAACTTIAAGCTCTCTCCCCC 

TGCCATAAAAACC 
TGCCATAAAAACCGCTTCAACTTAAGCTCTCTCCCCCCGTATCCGGCGAI 
TGCCATAAAAACCGCTTCAACTTIAAGCTCTCTCCCCC 
TGCCATAAAAACCGCTTCAACTTAAGCTCTCTCCCCCCGTATCCGGCGA| 


Der(6) 
Chr-6 


TGCAGATCACGAGGGAAGAGGGGGAAGGGAT TCT: 
GAAAGACTGGAGT TGCAGATCACGAGGGAAGAGGGGGAAGGGATTCT' 
AAAGACTGGAGT TGCAGATCACGAGGGAAGAGGGGGAAGGGAT TCT! 


TCTAGAAAGACTGGAGT TGCAGATCACGAGGGAAGAGGGGGAAGGGATTCT' 


AAGACTGGAGT TGCAGATCACGAGGGAAGAGGGGGAAGGGAT TCT 
CIGGAGTTGCAGATCACGAGGGAAGAGGGGGAAGGGATTCT: 
GGAAGAGGGGGAAGGGATTCT' 
GGGGGAAGGGATTCT' 
AGTTGCAGATCACGAGGGAAGAGGGGGAAGGGATTCT: 
ACTGGAGTTGCAGATCACGAGGGAAGAGGGGGAAGGGAT TCT: 
AGGGAAGAGGGGGAAGGGATTCT! 
GTTGCAGATCACGAGGGAAGAGGGGGAAGGGATTCT' 
GAGTTGCAGATCACGAGGGAAGAGGGGGAAGGGATTCT: 
GACTGGAGTTGCAGATCACGAGGGAAGAGGGGGAAGGGATTCT' 
[GGAGTTGCAGATCACGAGGGAAGAGGGGGAAGGGATTCT: 
CTGGAGTTGCAGATCACGAGGGAAGAGGGGGAAGGGATTCT: 
GAAAGACTGGAGT TGCAGATCACGAGGGAAGAGGGGGAAGGGAT TCT: 
ICACGAGGGAAGAGGGGGAAGGGATTCT: 
GAGTTGCAGATCACGAGGGAAGAGGGGGAAGGGATTCT: 
[GGAGTTGCAGATCACGAGGGAAGAGGGGGAAGGGATTCT: 
AAGACTGGAGT TGCAGATCACGAGGGAAGAGGGGGAAGGGATTCT! 
GACTGGAGTTGCAGATCACGAGGGAAGAGGGGGAAGGGATTCT' 
AAGACTGGAGTTGCAGATCACGAGGGAAGAGGGGGAAGGGATTCT' 


Chr-11 


“GCAGGACAACGCCCACACACCAGGT TAGCCT T TAAGCCTGCCCAGAAGACTCCCGCCCATCT AACTGGATGTCTTTGGGCATGATGGTGACTCTCTTGGCGTGG 


[GCAGGACAACGCCCACACACCAGGTTAGCCTT TAAGCCTGCCCAGAAG 
TGCAGGACAACGCCCACACACCAGGTTAGCCTT TAAGCCTGCCCAGAAG 
TGCAGGACAACGCCCACACACCAGGTTAGCCTT TAAGCCTGCCCAGAAG 


TGCAGGACAACGCCCACACACCAGGT TAGCCTT TAAGCCTGCCCAGAAG 
TGCAGGACAACGCCCACACACCAGGTTAGCCTT TAAGCCTGCCCAGAAG 
TGCAGGACAACGCCCACACACCAGGT TAGCCTT TAAGCCTGCCCAGAAG 
TGCAGGACAACGCCCACACACCAGGT TAGCCTT TAAGCCTGCCCAGAAG 
“GCAGGACAACGCCCACACACCAGGT TAGCCT TTAAGCCTGCCCAGAAG 
“GCAGGACAACGCCCACACACCAGGT TAGCCT TTAAGCCTGCCCAGAAG 


TGCAGGACAACGCCCACACACCAGGTTA 
[GCA 


TGCAGGACAACGCCCACACACCAGGTTAGCCTT TAAGCCTGCCCA 
TGCAGGACAACGCCCACACACCAGGTTAGCCTT TAAGCC 
TGCAGGACAACGCCCACACACCAGGTTAGCCTT TAAGCCTGCCCAGAAGAC 


TGCAGGACAACGCCCACACACCAGGTTAGCCTTTAAGCCTGCCCAGAAG] 


TGCAGGACAACGCCCACACACCAGGT TAGCCTT TAAGCCTGCCCAGAAGACTCCCGCC 


TGCAGGACAACGCCCACACACCAGGTTAGC 
TGCAGGACAACGCCCACACACCAGGTTAGC 


TGCAGGACAACGCCCACACACCAGGT TAGCCTT TAAGCCTGCCCAGAAGACTCCCGCCCA 


TGCAGGACA 


TGCAGGACAACGCCCACACACCAGGTTAGCCTT TAAGCCTGCCCAGAAGACTCCCGCCC 


TGCAGGACAACGCCCACACACCAGGT TAGCCTT TAAGCCTGCCCAGAAGACTCCCG 
TGCAGGACAACGCCCACACACCAGGTTAGCCTTTAAGCCTGCCCAGAAGACTCCCGC] 
[GCAGGACAACGCCCACACACCAGGT TAGCCTT TAAGCCTGCCCAGAAGACTC 


TGCAGGACAACGCCCACACACCAGGTTAGCCTT TAAGCCTGCCCAGAAGACTCCC 


JACTGGATGTCTTTGGGCATGATGGTGACTCTCTTGGCGTGG 
ACTGGATGTCTTTGGGCATGATGGTGACTCTCTTGGCGTGG 
JACTGGATGTCTTTGGGCATGATGGTGACTCTCTTGGCGTGG 


TGCAGGACAACGCCCACACACCAGGTTAGCCTT TAAGCCTGCCCAGAAGACTCCCGCCCATCT AACTGGATGTCTTTGGGCATGATGGTGACTCTCTTGGCGTGG 


JACTGGATGTCTTTGGGCATGATGGTGACTCTCTTGGCGTGG 
ACTGGATGTCTTTGGGCATGATGGTGACTCTCTTGGCGTGG 
JACTIGGATGTCTTTGGGCATGATGGTGACTCTCTTGGCGTGG 
|ACTGGATGTCTTTGGGCATGATGGTGACTCTCTTGGCGTGG 
JACTGGATGTCTTTGGGCATGATGGTGACTCTCTTGGCGTGG 
|ACIGGATGTCTTTGGGCATGATGGTGACTCTCTTGGCGTGG 
AACTGGATGTCT TTGGGCATGATGGTGACTCTCTTGGCGTGG 
ATGTCTTTGGGCATGATGGTGACTCTCTTGGCGTGG 
AACTGGATGTCT TTGGGCATGATGGTGACTCTCTTGGCGTGG 
CTTTGGGCATGATGGTGACTCTCTTGGCGTGG 
ATGATGGTGACTCTCTTGGCGTGG 
ATGATGGTGACTCTCTTGGCGTGG 

[CI TTGGGCATGATGGTGACTCTCTTGGCGTGG 
{GICTTTGGGCATGATGGTGACTCTCTTGGCGTGG 
[TITTGGGCATGATGGTGACTCTCTTGGCGTGG 
AACTGGATGTCTTTGGGCATGATGGTGACTCTCTTGGCGTGG 
AACTGGATGTCTTTGGGCATGATGGTGACTCTCTTGGCGTGG 
GGCATGATGGTGACTCTCTTGGCGTGG 


AACTGGATGTCTTTGGGCATGATGGTGACTCTCTTGGCGTGG 
CTCTTGGCGTGG 
AACTGGATGTCTTTGGGCATGATGGTGACTCTCTTGGCGTGG 
TCTTTGGGCATGATGGTGACTCTCTTGGCGTGG 

GTGG 


Extended Data Figure 5 | Sequence analysis of translocation junctions in remaining lines represent individual translocations recovered by PCR and 
Polq™’~ cells. (Related to Fig. 2.) Sequences of der(11) and der(6) breakpoint —_ subject to Sanger sequencing. It is important to note that insertions were 
junction from Polq™’~ cells. Predicted fusion breakpoint based on CRISPR completely lacking at the fusions junctions in Polq”’~ cells. 

cutting indicated by an arrow. Reference sequence is highlighted at the top. The 
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Extended Data Figure 6 | Pol@ recruitment to DNA breaks. (Related to and CHK2 phosphorylation. Cells with the indicated treatment were analysed 
Fig. 3.) a, Laser micro-irradiation experiment using HeLa cells expressing 2h after irradiation. c, Immunoblot for PARP1. HeLa cells were treated with 
Myc-Pol0 and treated with ATM inhibitor (KU55933), ATR inhibitor PARP1 siRNA and analysed 72 h after siRNA transfection for efficiency of 


(VE-821) or PARP inhibitor (KU58948). b, Western blot analysis for CHK1 knockdown. 
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LETTER 


Myc-Pold 


Fokl Merge Hoechst 


Control 
Low expressing cells High expressing cells 


U20S — DSB reporter + Myc-Polé 


PARPi 
High expressing cells 


# cells with % cells with 
# Cells 
Pold+/Fokl+ | Pole /Fok Eales 
colocalization colocalization 
217 55 25.3% 


Control 
Experiment 1 


PARPi 213 23 10.8% 

Control 247 66 26.7% 
Experiment 2 

PARPi 240 25 10.4% 


Extended Data Figure 7 | PARP1-dependent Pol0 recruitment to DNA 
double-stranded breaks (DSBs). (Related to Fig. 3.) a, Results from 
immunofluorescence performed 4h after induction (1 1M Shield1 ligand, 
Clontech 631037; 0.5 14M 4-OH tamoxifen) of DSBs by mCherry-LacI-FokI 
in the U2OS-DSB reporter cells'* transfected with the Myc-PolO and treated 
with PARP inhibitor (KU58948). The mCherry signal is used to identify the 
area of damage and to assess the recruitment of Myc-Pol® to cleaved LacO 
repeats. b, Table displaying quantification related to a. 
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Extended Data Figure 8 | Pol0 suppresses homology-directed repair at 
dysfunctional telomeres. (Related to Fig. 3.) a, Western blot analysis for Pol 
and LIG3 in shelterin-free Lig4-null MEFs. b, Western blot for TRF1 and RAP1 
after 4-OHT treatment of shelterin-free Lig4-deficient cells. c, Metaphase 
spreads from Trfl"” Trf2"" Liga ‘~ Cre-ER™ MEFs, with the indicated 
shRNA treatment, 96h after Cre expression. CO-FISH assay was performed 
using a FITC-OO-(CCCTAA); PNA probe (green) and a Tamra-OO- 
(TTAGGG); PNA probe (red). DAPI in blue. Examples of alt- NHEJ-mediated 
fusion and T-SCE events (HDR) are indicated by white and red arrows, 
respectively. Examples of T-SCE events reflective of increased HDR in cells 


treated with shPolg are on the right. d, e, Quantification of telomere fusions by 
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alt-NHEJ in MEFs with the indicated genotype and shRNA treatment. Bars 
represent mean of two independent experiments + s.e.m. f, Representative 
in-gel hybridization to assess 3’ overhang of Trfl’”” Trf2"”" Lig4/~ Cre-ER™ 
MEFs with the indicated shRNA treatment after Cre deletion. g, Quantification 
of the gel in f. The single-stranded DNA/total signal ratios of the ‘+Cre’ 
samples are expressed relative to the ‘— Cre’ samples for each shRNA treatment. 
Mean of two independent experiments. h, Graph representing RAD51 
accumulation after ionizing radiation treatment of Polq°?, Polq*’* and 
Polq ’~ embryonic stem cells. Bars represent mean of two independent 
experiments. *P >0.05; two-tailed Student’s t-test. 
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Extended Data Figure 9 | Pol@ promotes alt-NHEJ and inhibits 
homology-directed repair at I-Scel-induced DNA breaks. (Related to 

Fig. 3.) a, Pol@ represses recombination at DSBs induced by I-Scel. The TLR 
system was used to measure the relative ratio of end-joining (mCherry) and 
HDR (enhanced green fluorescent protein (eGFP)) repair of a DSB. A diagram 
of the TLR is represented. b, The TLR construct was stably integrated into 
Lig4’ and Ku80 ‘~ MEFs to avoid the confounding effect of C-NHE]J, and 
limit end-joining reactions to the alt- NHEJ pathway. Expression of mCherry 
and eGFP was assessed by flow cytometry 72 h after I-Scel and 5’ eGFP donor 
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transduction in cells with the indicated shRNA construct. Percentages of cells 
are indicated in the plot. c, Quantification of alt- NHEJ and HDR of TLR 
containing Ku80"‘~ MEEs after expression of I-Scel and 5’ eGEP together with 
the indicated shRNA construct. Bar graphs represent the mean of three 
independent experiments + s.d. *P = 0.03; two-tailed Student’s t-test. d, Real- 
time PCR to monitor the knockdown efficiency of Polq in Ku80"’~ and 
Lig4’~ MEFs. The FACS analysis reported in e and f was carried out without 
selecting for cells expressing the shRNA-containing plasmid. 
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Extended Data Figure 10 | Pol@ is required for survival of recombination- 
deficient cells. (Related to Fig. 4.) a, Accumulation of chromosomal 
aberrancies after Brcal and Brca2 knockdown in Polq ‘~ and Polq*’* 
MEFs. Quantification of chromosomal aberrancies (chromatid breaks, 
chromosome breaks and radials) in MEFs stably transduced with lentiviral 
vectors expressing the indicated shRNA. b, Real-time PCR to confirm the 
knockdown of Brcal and Brca2 as in a. c, Quantitative analysis of colony 
formation in Brcal”” Cre-ER™ and Lig4’~ cells after Polq depletion. The 
number of colonies in control shRNA-treated cells was set to 100%. Mean 
values are presented with error bars denoting + s.d. from three independent 
experiments. d, Real-time PCR to measure the knockdown efficiency of human 
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POLQ in BJ-hTERT, MCF7 and HCC1937 cells and mouse Polq in Breal”’* 
Cre-ER"” cells. e, Quantitative analyses of colony formation in BJ-hTERT, 
MCE7 and HCC1937 cells after LIG3 inhibition. The number of colonies in 
control-shRNA-treated cells was set to 100%. The knockdown efficiency for 
Lig3 was ~85%. Bars represent mean of two independent experiments + s.e.m. 
f. Quantitative analyses of colony formation in Polq°” and Polq*’* embryonic 
stem cells after BRCA1 inhibition. The number of colonies in control- 
shRNA-treated cells was set to 100%. The knockdown efficiency for BRCA1 
was >80%. Bars represent mean of two independent experiments + s.e.m. 
*P = 0.05; two-tailed Student’s f-test. 
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Homologous-recombination- deficient tumours are 
dependent on Pol@-mediated repair 
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Kevin W. O’Connor!, Panagiotis A. Konstantinopoulos’, Stephen J. Elledge’®, Simon J. Boulton®, Timur Yusufzai!? 


& Alan D. D’Andrea! 


Large-scale genomic studies have shown that half of epithelial ovar- 
ian cancers (EOCs) have alterations in genes regulating homologous 
recombination (HR) repair’. Loss of HR accounts for the genomic 
instability of EOCs and for their cellular hyper-dependence on alter- 
native poly-ADP ribose polymerase (PARP)-mediated DNA repair 
mechanisms? °. Previous studies have implicated the DNA polymer- 
ase 8 (Pol also known as POLQ, encoded by POLQ)° in a pathway 
required for the repair of DNA double-strand breaks’ ’, referred to 
as the error-prone microhomology-mediated end-joining (MME}J) 
pathway’’’*, Whether Pol interacts with canonical DNA repair path- 
ways to prevent genomic instability remains unknown. Here we report 
an inverse correlation between HR activity and Pol expression in 
EOCs. Knockdown of Pol@ in HR-proficient cells upregulates HR 
activity and RAD51 nucleofilament assembly, while knockdown of 
Pol@ in HR-deficient EOCs enhances cell death. Consistent with these 
results, genetic inactivation of an HR gene (Fancd2) and Polgq in mice 
results in embryonic lethality. Moreover, Pol@ contains RAD51 bind- 
ing motifs and it blocks RAD51-mediated recombination. Our results 
reveal a synthetic lethal relationship between the HR pathway and 
Pol-mediated repair in EOCs, and identify Pol as a novel druggable 
target for cancer therapy. 

To examine changes in polymerase activity between tumours and 
normal tissues, we screened polymerase gene expression profiles in a 
large number of cancers (Supplementary Table 1). Gene set enrichment 
analysis (GSEA) revealed specific and recurrent overexpression of Pol0 
in EOCs (Extended Data Fig. 1a—c). Pol was upregulated in a grade- 
dependent manner and its expression positively correlated with numerous 
mediators of HR (Extended Data Fig. 1d-}). As Pol® has been suggested 
to play a role in DNA repair’ "°, we investigated a potential role for Pol 
in HR repair. 

To test the relationship between Pol® expression and HR, we used a 
cell-based assay in human cells which measures the efficiency of recom- 
bination of two GFP alleles (DR-GFP assay)'*. Knockdown of Pol@ with 
short interfering RNA (siRNA) (Extended Data Fig. 2a) resulted in an 
increase in HR efficiency, similar to that observed by depleting the anti- 
recombinases PARI or BLM’**”’. Depletion of Pol@ caused a significant 
increase in basal and radiation (IR)-induced RADS1 foci (Fig. 1a, b and 
Extended Data Fig. 2b-d), and depletion of Pol@ in 293T cells conferred 
cellular hypersensitivity to mitomycin C (MMC) and an increase in 
MMC-induced chromosomal aberrations (Extended Data Fig. 2e, f). 
These findings suggest that human Pol0 inhibits HR and participates 
in the maintenance of genome stability. 

Given that Pol® shares structural homology with coexpressed RADS51- 
binding ATPases (Extended Data Fig. 1k, 1), we hypothesized that Pol® 
might regulate HR through an interaction with RAD51. RAD51 was 
detected in Flag-tagged Pol0 immunoprecipitates, and purified full-length 


Flag—Pol® bound recombinant human RADS51 (Fig. 1c, d). Pull-down 
assays with recombinant GST-RAD51 and in vitro translated Pol@ trun- 
cation mutants defined a region of Pol®@ binding to RAD51 spanning 
amino acids 847-894 (Fig. le, fand Extended Data Fig. 2g, h). Sequence 
homology of Pol@ with the RAD51 binding domain of C. elegans RFS-1 
(ref. 17) identified a second binding region (Extended Data Fig. 2i). Pep- 
tides arrays narrowed down the RAD51 binding activity of Pol@ to three 
distinct motifs (Fig. 1g and Extended Data Fig. 2j). Substitution arrays 
confirmed the interaction and highlighted the importance of the 847- 
894 Pol® region as both necessary and sufficient for RAD51 binding 
(Extended Data Fig. 3a, b). Together these results indicate that Pol® is 
a RAD51-interacting protein that regulates HR. 

In order to address the role of Pol@ in HR regulation, we assessed the 
ability of wild-type or mutant Pol to complement the siPol0-depend- 
ent increase in RAD51 foci. Full-length wild-type Pol@ fully reduced 
IR-induced RAD51 foci, unlike Pol8 mutated at ATPase catalytic resi- 
dues (A-dead) or Pol@ lacking interaction with RAD51 (ARAD51) 
(Fig. 2a, b). Expression ofa Pol@ mutant lacking the polymerase domain 
(APol1) was sufficient to decrease IR-induced RADS1 foci, suggesting 
that the N-terminal half of Pol is sufficient to disrupt RAD51 foci (Fig. 2b 
and Extended Data Fig. 3c, d). We next measured the ability of wild- 
type or mutant Pol@ to complement the siPol@-dependent increase in 
HRefficiency. Again, expression of full-length Pol® or APoll decreased 
the recombination frequency when compared to cells expressing other 
Pol€ constructs, suggesting that the N-terminal half of Pol containing 
the RADS51 binding domain and the ATPase domain is needed to inhibit 
HR (Fig. 2c and Extended Data Fig. 3e). 

A purified recombinant Pol® fragment (APol2) from insect cells exhib- 
ited low levels of basal ATPase activity, as previously reported’* (Fig. 2d, e). 
Pol ATPase activity was selectively stimulated by the addition of single- 
stranded DNA (ssDNA) or fork DNA (Fig. 2e and Extended Data Fig. 4a). 
Electrophoretic mobility gel shift assays (EMSA) showed specific bind- 
ing of Pol® to ssDNA (Fig. 2fand Extended Data Fig. 4b). We incubated 
APol2 with ssDNA and measured RAD51-ssDNA nucleofilament assem- 
bly. Interestingly, RAD51-ssDNA assembly was reduced by APol2 wild- 
type but not by A-dead or ARAD51, indicating that Pol@ negatively affects 
RAD51-ssDNA assembly through its RAD51 binding and ATPase activ- 
ities (Fig. 2g and Extended Data Fig. 4c—f). Furthermore, Pol0 decreased 
the efficiency of D-loop formation, confirming that Pol0 is a negative 
regulator of HR (Fig. 2h and Extended Data Fig. 4g-)). 

As Pol is upregulated in subgroups of cancers associated with HR defi- 
ciency (Fig. 3a) and Pol0 activity shows specificity for replicative-stress- 
mediated structures (ssDNA and fork DNA) (Fig. 2e, f), we examined 
the cellular functions of Pol@ under replicative stress. Subcellular frac- 
tionation revealed that Pol@ is enriched in chromatin in response to 
ultraviolet (UV) light; and RAD51 binding by Pol® was enhanced by UV 
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Figure 1 | Pol@ is a RAD51-interacting protein that suppresses HR. 

a, DR-GFP assay in U2OS cells transfected with indicated siRNA. 

b, Quantification of RAD51 foci in U2OS cells transfected with indicated 
siRNA. c, Endogenous RAD51 co-precipitates in vivo with purified full-length 
Flag-tagged PolO from whole cell extracts. EV, empty vector. d, GST pull-down 
experiment with full-length Flag-tagged Pol (* indicates non-specific band). 


e, GST-RAD51 pull-down with in vitro translated PolO truncation mutants. 
f, GST-RADS51 pull-down with in vitro translated Pol@ versions missing 
indicated amino acids. g, Ponceau staining and immunoblotting of peptide 
arrays for the indicated Pol motifs probed with recombinant RAD51. The Pol® 
amino acids spanning RAD51-interacting motifs are shown. Data in a and 

b represent mean = s.e.m. 


exposure, suggesting that Pol@ regulates HR in cells under replicative 
stress (Extended Data Fig. 5a, b). Pol6-depleted cells were hypersensitive 
to cellular stress and DNA damage, along with an exacerbated checkpoint 


activation and increased yH2AX phosphorylation (Fig. 3b, c). Further- 
more, the cell cycle progression of Pol-depleted cells was impaired after 
DNA damage (Fig. 3d, e). To determine the role of Pol@ in replication 
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Figure 2 | Pol@ inhibits RAD51-mediated recombination. a, Schematic of 
Pol@ mutants used in complementation studies and their interaction with 
RADS51. WT, wild type. b, Quantification of RAD51 foci in U20S cells 
transfected with indicated siRNA and Pol@ cDNA constructs refractory to 
siPol@1. c, DR-GFP assay in U20S cells transfected with indicated siRNA and 


Pol§ cDNA constructs refractory to siPol@1. d, Coomassie-stained gel of the 
purified Pol0 fragment. e, Quantification of Pol@ ATPase activity. 

f, Quantification of Pol@ binding to ssDNA and dsDNA. g, RAD51-ssDNA 
nucleofilament assembly assay. h, Assessment of RAD51-dependent D-loop 
formation. Data in b, c, e and f represent mean + s.e.m. 


12 FEBRUARY 2015 | 
©2015 Macmillan Publishers Limited. All rights reserved 


VOL 518 | NATURE | 259 


LETTER 


a 15 os — 456 A2780: 435 A2780: 125 2780: Flag-Polé 
-— = -® shScr g ~e shScr s ~e shScr —.F 
ee 2 2 
Su. wy q = 100 —= shPolé we 100 -=- shPolé ws & 100 —= shPolé 5 2 
in: @ 2g 2o Za Oa 
o5 0.5 52 aes Zo = 
29 se 75 Se 75 ag 75 = 
a? as ao Bo 
xy 0.0 =o =D 2D Polo | 
ec 52> [—are} £6 
o§ Be 50 Qe 50 8 § 50 
ag 05 Ba se 5 O Flag (Pol6) |= 
a 4.0 a 25 °F 25 °F 25 ‘edt 
ao} 2 Vinculin |= 
a 
15 ; : 0 = = 0 
Ovarian Uterine Breast (mM): 0 0.1 0.25 0.5 MMC (ngmi):0 1 IR (Gy): 0 
f 1 f 1 f 1 
Subtype: Grade 1 Serous Rest Serous Rest Basal 
like like 
Replicative stress d Block A2780: 
HR-defect and damage Release shScr 
CN alteration: = - + - + - 4 mae 
Time after ¥ isa: 
release (h): 0 4 12 
c = , ri G1 
= Ss \ 
HU (2mM) shScr shPold MMC (1hg mr") shScr shPolé e (10 Gy) shScr shPold E \ I shSer 8 
Time (h) —— —— _ Time (h) ——, ———{ __ Time (h) —— ——, a | || \| shPolé mc2 
afterrelease: 01240124 after release: 0 81224 081224 after release: 00.51 2 00.51 2 2 | } 
pS824KAP1 (a pS824KAP1 2 ola psa24kaPi[ se- eo _—— — = 
— a ay S 1004 a = 
pT68CHK2 = pT68CHK2 | pT68CHK2[ === os = E | Y & fel le 
= \ 5 80 
pS317CHK1| a = pS317CHK1 awe pS317CHK1[ a= DoS oO | \ 2 = 
= Su | } B 60 il 
pS15p53 = ota pS15p53 |r PS15p53 |e ee me ee = 2 40 
r = Tl s S 
*yH2AX * 7H2AX YH2AX = e = 20 
= 2° oO 
Vinculin| ———<—— — — Vinculin | — © — eee ee me Vinculin | —_— ——-— == — c i o 0 I eS 
Time after HU 
release (h): a 12 
e f ci IdU g CldU HU IdU 
25 1M (20 min) 250 uM (25 min) 25 1M (20 min) (2mM 2h) 250 uM (25 min) 
* 
ee pal ronriae ** 
125 * 1.5 4 
5 —S A2780: shScr shPold _ ==. shScr shPolé 52 
2 : 1 = 
55 100 shScr € Bea 
£5 22 
g 5) os Gi shPole 2 £ w 
Q > o@ 
SS 3 85 ? 
QE 50 g £5 
g2 S £8, 
gi 2s 5 Bs 
2 im “2o 
o o 
0 
Time after MMC A2780: shScr — shPolé 
release (h): 12 24 48 72 


Figure 3 | Pol promotes S phase progression and recovery of stalled forks. 
a, POLQ gene expression in subtypes of cancers with HR deficiency. b, Survival 
assays of A2780 cells exposed to the indicated DNA-damaging agents. 
Immunoblot showing silencing efficiency is shown on the right. c, Immunoblot 
analyses following pulse treatments with DNA-damaging agents (*yH2AX: 
see Methods for specific time points used for YH2AX immunoblot). HU, 
hydroxyurea. d, Cell cycle progression of synchronized A2780 cells. A 


dynamics, single-molecule analyses were performed on extended DNA 
fibres'’. Abnormalities in replication fork progression were observed in 
PolO-depleted cells (Fig. 3f, g and Extended Data Fig. 5c, d). These results 
suggest that Pol maintains genomic stability at stalled or collapsed 
replication forks by promoting fork restart. 

To examine the regulation of Pol®, we quantified Pol@ expression 
by RT-qPCR. Pol@ was selectively upregulated in HR-deficient ovarian 
cancer cell lines. Complementation of BRCA1 or FANCD2-deficient 
cell lines with BRCA1 and FANCD2 cDNA respectively, restored normal 
HR function and reduced Pol6 expression to normal levels. Conversely, 
siRNA-mediated inhibition of HR genes increased Pol® expression 
(Extended Data Fig. 5e, f). PolO expression was significantly higher in 
subgroups of cancers with HR deficiency and a high genomic instability 
pattern” (Fig. 3a and Extended Data Fig. 5g). Patients with high Pol@ 
expression had a better response to platinum chemotherapy, a surrogate 
for HR deficiency, suggesting that PolO expression inversely correlates 
with HR activity and may be useful as a biomarker for platinium sen- 
sitivity (Extended Data Fig. 5h, i). Together, these data indicate that 
increased Pol@ expression is driven by HR deficiency. 

Toassess the possible synthetic lethality between HR genes and Pol, 
we generated an HR-deficient ovarian tumour cell line, A2780-shFANCD2 
cells (Extended Data Fig. 6a—c). These cells, and the parental A2780 cells, 
were subjected to Pol@ depletion, and survival following exposure to 
cytotoxic drugs was measured. Pol depletion reduced the survival of 
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representative cell cycle distribution. e, Fraction of cycling A2780 cells 
measured by 5-ethynyl-2’-deoxyuridine (EdU) incorporation. f, Quantification 
of DNA fibre lengths. g, Percentage of stalled forks. All experiments shown 
in a-d were performed in two cell lines (A2780 and 293T). All data represent 
mean + s.e.m. except for box plots in f that show twenty-fifth to seventy-fifth 
percentiles, with lines indicating the median, and whiskers indicating the 
smallest and largest values. 


HR-deficient cells exposed to inhibitors of PARP (PARPi), cisplatin 
(CDDP) or MMC (Extended Data Fig. 6d-f). Pol@ inhibition impaired 
the survival of BRCA1-deficient tumours (MDA-MB-436) after PARPi 
treatment but had no effect on the complemented line (MDA-MB-436 
+ BRCA1) (Fig. 4a). Pol@-depleted cells were hypersensitive to ATM 
inhibition, known to create an HR defect phenotype”. Chromosomal 
breakage, checkpoint activation, and YH2AX phosphorylation in response 
to MMC were exacerbated by Pol depletion (Fig. 4b and Extended Data 
Fig. 6g, h). Furthermore, a whole-genome short hairpin RNA (shRNA) 
screen performed on HR-deficient (FANCA ~’~) fibroblasts showed that 
shRNAs targeting PolO impair cell survival in MMC (Extended Data 
Fig. 6i), suggesting that HR-deficient cells cannot survive in the absence 
of Pol. 

Next, we investigated the interaction between the HR and Pol path- 
ways in vivo by interbreeding Fancd2*’” and Polq*’ mice. Although 
Fancd2-‘~ and Polq ‘~ mice are viable and exhibit subtle phenotypes”, 
viable Fancd2’~Polq’ mice were uncommon from these matings (Ex- 
tended Data Fig. 7a). The only surviving Fancd2’Polq’~ pups exhibited 
severe congenital malformations and were either found dead or died 
prematurely. Fancd2’ Polq’ embryos showed severe congenital mal- 
formations, and mouse embryonic fibroblasts (MEFs) generated from 
Fancd2’-Polq’ embryos showed hypersensitivity to PARPi (Fig. 4c and 
Extended Data Fig. 7b-e). These data suggest that loss of the HR and 
Pol@ repair pathways in vivo results in embryonic lethality. 
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Figure 4 | Synthetic lethality between HR and Pol repair pathways. 

a, Clonogenic formation of BRCA1-deficient (MDA-MB.-436) cells expressing 
indicated cDNA together with indicated shRNA. b, Chromosome breakage 
analysis of HR-deficient cells transfected with the indicated siRNA. A 
representative image is shown. Arrows indicate chromosomal aberrations. 

c, Embryos at day 14 of gestation. d, Growth of indicated xenografts in vivo. 
Immunoblot showing silencing efficiency. e, Relative tumour volumes (RTV) 
for individual mice treated in d after three weeks of treatment. f, Overall 


As xenografts of tumours cells expressing shRNAs against both 
FANCD2 and Pol@ did not stably propagate in mice (Extended Data 
Fig. 7f), we xenotransplanted A2780-shFANCD2 cells expressing either 
doxycycline-inducible Pol@ or scrambled (Scr) shRNA in athymic nude 
mice. Pol depletion significantly impaired tumour growth after PARPi 
treatment (Fig. 4d, e and Extended Data Fig. 7g, h). Moreover, mice 
bearing Pol-depleted tumours had a survival advantage following PARPi 
treatment compared to control mice (Fig. 4f). Pol@-depleted HR-deficient 
tumour cells also exhibited decreased survival in in vivo dual-colour 
competition experiments (Extended Data Fig. 7i-1). Collectively, these 
data confirm that HR-deficient tumours are hypersensitive to inhibi- 
tion of Pol@-mediated repair. 

To understand which functions of Pol@ are required for resistance 
to DNA-damaging agents, we performed a series of complementation 
studies in HR-deficient cells. Expression of full-length Pol@ or APoll, 
but not ARADS51, in HR-deficient Pol€-depleted cells treated with PARPi 
or MMC was able to rescue toxicity, suggesting that the anti-recombinase 
activity of Pol@ maintains the genomic stability of HR-deficient cells 
(Fig. 4g, h and Extended Data Fig. 8a, b). Moreover, the toxicity induced 
by loss of Pol0 in HR-deficient cells was rescued by depletion of RAD51 
showing that, in the absence of Pol@, RAD51 is toxic to HR-deficient 
cells (Fig. 4i). These results suggest a role for Pol@ in limiting toxic HR 
events” (Extended Data Fig. 8c-f) and may explain why HR-deficient 
cells overexpress and depend on an anti-recombinase for survival. 

High mutation rates have been observed in HR-deficient tumours”. 
Previous studies have shown that Pol is an error-prone polymerase**”° 


survival for mice treated with vehicle or PARPi. Log-rank P< 107°. 

g, h, Clonogenic formation (g) and chromosome breakage analysis (h) of 
BRCA2-deficient cells expressing Pol0 cDNA constructs refractory to siPol@1 
and transfected with the indicated siRNA. i, Clonogenic formation of 
BRCA2-deficient cells transfected with the indicated siRNA. j, Model for role of 
Pol0 in DNA repair. Data in a, b, g and i represent mean + s.e.m. For data 
in d-f, each circle represents data from one tumour and each group represents 
n=7 tumours from n= 6 mice. Brackets show mean = s.e.m. 


that participates in alternative end-joining (alt-EJ)'°. Therefore, we 
assessed the role of Pol@ in error-prone DNA repair in human cancer 
cells. PolO inhibition reduced alt-EJ efficiency in U2OS cells, similar 
to the reduction observed following depletion of PARP1, another crit- 
ical factor in end-joining”””* (Extended Data Fig. 9a). Expression of 
full-length Pol8, ARADS51, or A-dead, but not the APol1 mutant, com- 
plemented the cells, suggesting that the polymerase domain of Pol@ 
is required for end-joining (Extended Data Fig. 9b). GFP-tagged full- 
length Pol@ formed foci after UV treatment in a PARP-dependent man- 
ner (Extended Data Fig. 9c). Pol inhibition reduced the mutation 
frequency induced by UV light, and tumours with high Pol@ expres- 
sion harboured more somatic point mutations than those with lower 
Pol@ levels (Extended Data Fig. 9d, e). These results suggest that Pol 
contributes to the mutational signature observed in some HR-deficient 
tumours”. 

In human cancers, a deficiency in one DNA repair pathway can 
result in cellular hyper-dependence on a second compensatory DNA 
repair pathway*. Here we show that Pol is overexpressed in EOCs 
and other tumours with HR defects”. Wild-type Pol@ limits RAD51- 
ssDNA nucleofilament assembly (Extended Data Fig. 10a) and pro- 
motes alt-EJ (Fig. 4j). We demonstrate that HR-deficient tumours are 
hypersensitive to inhibition of Pol6-mediated repair. Therefore, Pol® 
appears to channel DNA repair by antagonizing HR and promoting 
PARP 1-dependent error-prone repair (Extended Data Fig. 10b). These 
results offer a potential new therapeutic target for cancers with inacti- 
vated HR. 
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METHODS 


Bioinformatic analysis. Gene Set Enrichment Analysis algorithm (GSEA, http:// 
www.broadinstitute.org) was performed for the data sets summarized in Supplemen- 
tary Table 1. TransLesion Synthesis (TLS) and polymerase gene sets are described 
in Supplementary Table 3. Row expression data were downloaded from Gene Expres- 
sion Omnibus (GEO). Quantile normalizations were performed using the RMA 
routine through GenePattern. GSEA was run using GenePattern (http://www. 
broadinstitute.org) and corresponding P values were computed using 2,000 permu- 
tations. The DNA repair gene set used in Extended Data Fig. 1g has been determined 
according to a list of 151 DNA genes previously used*!. GSEA analysis for 151 repair 
genes has been performed on the ovarian serous data sets (GSE14001, GSE14007, 
GSE18520, GSE16708, GSE10971). The list of 20 genes shown in Extended Data 
Fig. 1g represents the top 20 expressed gene in cancer samples (median of the 5 data 
sets). The waterfall plot in Extended Data Fig. 1h was generated as follows: the 20 
genes defined in Extended Data Fig. 1g were used as a gene set; GSEA for indicated 
data sets was performed and the nominal P values were plotted. Supervised analysis 
of gene expression for GSE9891 was performed with respect to differential expres- 
sion that differentiated the third of tumours with highest POLQ expression from the 
two-thirds with lowest POLQ levels. A list of the 200 most differentially expressed 
probe sets between the 2 groups (Supplementary Table 2) with false discovery rate 
<0.05 was analysed for biological pathways (hypergeometrical test; http://www. 
broadinstitute.org). TCGA data sets were accessed through the public TCGA data 
portal (https://tcga-data.nci.nih.gov/tcga/). Fig. 3a reflects POLQ gene expression 
in the ovarian carcinoma data set GSE9891, uterine carcinoma TCGA and breast 
carcinoma TCGA. Normalization of POLQ expression values across data sets was 
performed using z-score transformation. POLQ expression values were subdivided 
in subgroups reflecting the stage of the disease (for GSE9891: grade 3 ovarian serous 
carcinoma, n = 143 compared to type 1 (grade 1) ovarian cancers, n = 20; for uterine: 
serous-like tumours, n = 60 compared to the rest of the tumours, n = 172; for breast: 
basal-like breast carcinoma, n = 80 compared to the rest of the tumours, n = 421). 
Progression-free survival curves were generated by the Kaplan-Meier method and 
differences between survival curves were assessed for statistical significance with 
the log-rank test. In the absence ofa clinically defined cut-off point for POLQ expres- 
sion levels, we divided patients into 2 groups: those with POLQ mRNA levels equal 
to or above the median (POLQ high group) and those with values below the median 
(POLQ low group). We then analysed the correlation of POLQ with outcome in 
each group. Patients with cyclin El (CCNE1) amplification (resistant to CDDP) were 
excluded from the analysis. For mutation count, we accessed data from tumours 
included in the TCGA data sets for which gene expression and whole-exome DNA 
sequencing was available. Data were accessed through the public TCGA data portal 
and the cBioPortal for Cancer Genomics (http://www.cbioportal.org). For each 
TCGA data set, non-synonymous mutation count was assessed in tumours with 
the highest POLQ expression (top 33%) and compared to tumours with low POLQ 
expression (the remaining 67%). In the uterine TCGA”, we curated all tumours 
except the ultra and hyper-mutated group (that is, POLE and MSI tumours). In 
the breast TCGA”, all tumours were analysed. In the ovarian TCGA', we curated 
tumours harbouring molecular alterations (via mutation and epigenetic silencing) 
of the HR pathway. 

Plasmid construction. To facilitate subcloning, a silent mutation (A390A) was 
introduced into the POLQ gene sequence to remove the unique Xhol cutting site. 
Full-length or truncated POLQ cDNA were PCR-amplified and subcloned into 
pcDNA3-N-Flag, pFastBac-C-Flag, pOZ-C-Flag-HA, or GFP-Cl vectors to gener- 
ate the various constructs. Point mutations and loop deletions were introduced by 
QuikChange II XL Site-Directed Mutagenesis Kit (Agilent Technologies) and con- 
firmed by DNA sequencing. For Pol® rescue experiments (Fig. 4g, h and Extended 
Data Fig. 3d, e), POLQ cDNA constructs resistant to siPol01 were generated into 
the pOZ-C-Flag-HA vector and the constructs were stably expressed in indicated 
cell line by retroviral transduction. The Pol§ ATPase catalytically-dead mutant (A- 
dead) was generated by mutating the Walker A and B motifs (Walker A: K121A 
and Walker B: D216A, E217A). pOZ-C-Flag-HA Pol0 constructs were generated for 
retroviral transduction, and stable cells were selected using magnetic Dynabeads 
(Life Technologies) conjugated to the IL2R antibody (Millipore). 

SiRNA and shRNA sequence information. For siRNA-mediated knockdown, the 
following target sequences were used: POLQ (Qiagen POLQ_1 used as siPol®1 and 
Qiagen POLQ_6 used as siPol92); BRCA1 (Qiagen BRCA1_13); PARP1 (Qiagen 
PARP1_6); REVI (5’-CAGCGCAUCUGUGCCAAAGAA-TT-3’); BRCA2 (5'-G 
AAGAAUGCAGGUUUAAUATT-3’); BLM (5'-AUCAGCUAGAGGCGAUCA 
ATT-3’); FANCD2 (5'-GGAGAUUGAUGGUCUACUATT-3’) and PARI (5'-A 
GGACACAUGUAAAGGGAUUGUCUATT-3’). AllStars negative control siRNA 
(Qiagen) served as the negative control. ShRNAs targeting human FANCD2 was 
previously generated in the pTRIP/DU3-MND-GFP vector”. ShRNAs targeting 
human POLQ (CGGGCCTCTTTAGATATAAAT), human BRCA2 (AAGAAGA 
ATGCAGGTTTAATA) or control (Scr, scramble) were generated in the pLKO-1 
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vector. POLQ (V2THS_198349) and non-silencing TRIPZ-RFP doxycycline- 
inducible shRNA were purchased from Open Biosystems. All shRNAs were trans- 
duced using lentivirus. 

Immunoblot analysis, fractionation and pull-down assays. Cells were lysed with 
1% NP-40 lysis buffer (1% NP-40, 300 mM NaCl, 0.1 mM EDTA, 50 mM Tris (pH 
7.5)) supplemented with protease inhibitor cocktail (Roche), resolved by NuPAGE 
(Invitrogen) gels, and transferred onto nitrocellulose membrane, followed by detec- 
tion using the LAS-4000 Imaging system (GE Healthcare Life Sciences). For immu- 
noprecipitation, cells were lysed with 300 mM NaCl lysis buffer, and the lysates 
were diluted to 150 mM NaCl before immunoprecipitation. Lysates were incu- 
bated with anti-Flag agarose resin (Sigma) followed by washes with 150 mM NaCl 
buffer. In vitro transcription and translation reactions were carried out using the 
TNT T7 Quick Coupled Transcription-Translation System (Promega). For cellular 
fractionation, cells were incubated with low-salt permeabilization buffer (10 mM 
Tris (pH 7.3), 10 mM KCl, 1.5 mM MgC],) with protease inhibitor on ice for 20 min. 
Following centrifugation, nuclei were resuspended in 0.2 M HCl and the soluble 
fraction was neutralized with 1 M Tris-HCl (pH 8.0). Nuclei were lysed in 150 mM 
NaCl and following centrifugation, the chromatin pellet was digested by micro- 
coccal nuclease (Roche) for 5 min at room temperature. Recombinant GST-RAD51 
and GST-PCNA fusion protein were expressed in BL21 strain and purified using 
glutathione-Sepharose beads (GE Healthcare) as previously described’’. Beads with 
equal amount of GST or GST-RADS51 were incubated with in vitro translated Flag- 
tagged Pol variants in 150 mM NaCl lysis buffer. 

Antibodies and chemicals. Antibodies used in this study included: anti-PCNA 
(PC-10), anti-FANCD2 (FI-17), anti-RAD51 (H-92), anti-GST (B14), and histone 
H3 (FL-136) and anti-vinculin (H-10) (Santa Cruz); anti-Flag (M2) (Sigma); anti- 
pS317CHK1 (2344), anti-pT68CHK2 (2661) (Cell signalling); anti-pS824KAP-1 
(A300-767A) (Bethyl); anti-pS317yH2AX (05636) (Millipore); anti-pS15p53 (ab1431) 
and anti-Pol@ (ab80906) (abcam); anti-BrdU (555627) (BD Pharmingen). Mitomycin 
C (MMC), cis-diamminedichloroplatinum(II) (cisplatin, CDDP), and hydroxyurea 
(HU) were purchased from Sigma. The PARPi rucaparib (AG-014699) was pur- 
chased from Selleckchem and ABT-888 from AbbVie. Rucaparib was used for all 
in vitro assays and ABT-888 was used for all in vivo experiments. 
Chromosomal breakage analysis. 293T and VU 423 cells were twice-transfected 
with siRNAs for 48 h and incubated for 48 h with or without the indicated concen- 
trations of MMC. For complementation studies on 293T shFANCD2, POLQcDNA 
constructs were transfected 24 h after the first siRNA transfection. Cells were exposed 
for 2h to 100 ng ml * of colcemid and treated with a hypotonic solution (0.075 M 
KCl) for 20 min and fixed with 3:1 methanol/acetic acid. Slides were stained with 
Wright’s stain and 50 metaphase spreads were scored for aberrations. The relative 
number of chromosomal breaks was calculated relative to control cells (si Scr). Radial 
figures were excluded from the analysis for clarity in Fig. 4b. 

Reporter assays and immunofluorescence. HR and alt-EJ efficiency was measured 
using the DR-GFP (HR efficiency) and the alt-EJ reporter assay, performed as pre- 
viously described’*””**. Briefly, 48 h before transfection of Scel cDNA, U20S-DR- 
GFP cells were transfected with the indicated siRNA or PARPi (1 1M). The HR 
activity was determined by FACS quantification of viable GFP-positive cells 96h 
after Scel was transfected. For RAD51 immunofluorescence experiments, cells were 
transfected with indicated siRNA 48h before treatment with HU (2mM) or IR 
(10 Gy). For complementation studies, Pol® cDNA constructs were either trans- 
fected 24 h after siRNA transfection (Fig. 2b, c and Extended Data Fig. 9b) or stably 
expressed in the indicated cell line (Extended Data Fig. 3d, e). 6h after HU or IR 
treatment, cells were fixed with 4% paraformaldehyde for 10 min at room temper- 
ature, followed by extraction with 0.3% Triton X-100 for 10 min on ice. Antibody 
staining was performed at room temperature for 1 h. For quantification of RAD51 
foci in BrdU positive cells, cells were transfected with indicated siRNA 48 h before 
treatment with IR (10 Gy). Then 2 h after IR treatment, cells were treated with BrdU 
pulse (10 11M) for 2 hand subsequently fixed with 4% paraformaldehyde and stained 
for RAD51 as described above. Cells were then fixed in ethanol (4 °C, overnight), 
treated with 1.5M HCl for 30 min and stained for BrdU antibody. The relative 
number of cells with more than 10 RADS51 foci was calculated relative to control 
cells (si Scr). Statistical differences between cells transfected with siRNAs (si Pol@1, 
si Pol02, si BRCA2, si PARI or si BLM relative to control (si Scr) were assessed. For 
GFP fluorescence, cells were grown on coverslips, treated with UV (24 h after GFP- 
Pol transfection; 20Jm ”), fixed with 4% paraformaldehyde for 10 min at 25 °C 
4h after the UV treatment, washed three times with PBS and mounted with DAPI- 
containing mounting medium (Vector Laboratories). When indicated, cells were 
treated with PARPi (1 4M) 24h before GFP-Pol transfection. Images were cap- 
tured using a Zeiss AX10 fluorescence microscope and AxioVision software. Cells 
with GFP foci were quantified by counting number of cells with more than five foci. 
At least 150 cells were counted for each sample. 

Cell survival assays. For assessing cellular cytotoxicity, cells were seeded into 96- 
well plates at a density of 1,000 cells per well. Cytotoxic drugs were serially diluted 
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in media and added to the wells. At 72 h, CellTiter-Glo reagent (Promega) was added 
to the wells and the plates were scanned using a luminescence microplate reader. 
Survival at each drug concentration was plotted as a percentage of the survival in 
drug-free media. Each data point on the graph represents the average of three mea- 
surements, and the error bars represent the standard deviation. For clonogenic 
survival, 1,000 cells per well were seeded into 6-well plates and treated with cyto- 
toxic drugs the next day. For MMC and PARPi, cells were treated continuously with 
indicated drug concentrations. For CDDP, cells were treated for 24h and cultured 
for 14 days in drug-free media. Colony formation was scored 14 days after treat- 
ment using 0.5% (w/v) crystal violet in methanol. Survival curves were expressed as 
a percentage + s.e.m. over three independent experiments of colonies formed rela- 
tive to the DMSO-treated control. 

Cell cycle analysis. A2780 cells expressing Scr or Pol8 shRNA were synchronized 
bya double thymidine block (Sigma) and subsequently exposed to MMC (1 pg ml! 
for 2h), IR (10 Gy) or HU (2 mM, overnight). At the indicated time points following 
drug release, cells were fixed in chilled 70% ethanol, stored overnight at —20 °C, 
washed with PBS, and resuspended in propidium iodide. A fraction of those cells 
was analysed by immunoblotting for DNA damage response proteins. The immu- 
noblot analysis of yYH2AX shows staining after 0, 24, 48 and 72 h of HU treatment. 
For proliferation experiments, cells were incubated with 5-ethynyl-2'-deoxyuridine 
(EdU) (10 1M) for 1h at each time point after MMC exposure (1 pg ml ! for 2h). 
Cells were washed and resuspended in culture medium for 2 h before be analysed 
by flow cytometry. Edu staining was performed using the Click-iT EdU kit (Life 
Technologies). 

DNA fibre analysis. A2780 cells expressing Scr or Pol@ shRNA were incubated 
with 25 1M chlorodeoxyuridine (CldU) (Sigma, C6891) for 20 min. Cells were then 
treated with 2 mM hydroxyurea (HU) for 2 h and incubated in 250 [1M iododeox- 
yuridine (ldU) (Sigma, 17125) for 25 min after washout of the drug. Spreading of 
DNA fibres on glass slides was done as previously reported”. Glass slides were then 
washed in distilled water and in 2.5 M HCl for 80 min followed by three washes in 
PBS. The slides were incubated for 1 h in blocking buffer (PBS with 1% BSA and 
0.1% NP-40) and then for 2h in rat anti-BrdU antibody (1:250, Abcam, ab6326). 
After washing with blocking buffer, the slides were incubated for 2 h in goat anti-rat 
Alexa 488 antibody (1:1,000, Life Technologies, A-11006). The slides were then 
washed with PBS and 0.1% NP-40 and then incubated for 2 h with mouse anti-BrdU 
antibody diluted in blocking buffer (1:100, BD Biosciences, 347580). Following 
an additional wash with PBS and 0.1% NP-40, the fibres were stained for 2 h with 
chicken anti-mouse Alexa 594 (1:1000, Life Technologies, A-21201). At least 150 
fibres were counted per condition. Pictures were taken with an Olympus confocal 
microscope and the fibres were analysed by ImageJ software. The number of stalled 
or collapsed forks were measured by DNA fibres that had incorporated only CIdU. 
Stalled or collapsed forks counted in Pol@-depleted cells is expressed as fold-change 
after HU treatment relative to the fold-change observed in control cells, which was 
arbitrarily set to 1. 

SupF mutagenesis assay. 293T cells twice-transfected with siRNAs for 48 h were 
then transfected with undamaged or damaged (UVC, 1,000 J m_ ”) pSP189 plasmids 
using GeneJuice (Novagen). After 48 h, plasmid DNA was isolated with a miniprep 
kit (Promega) and digested with DpnI. After ethanol precipitation, extracted plas- 
mids were transformed into the B-galactosidase— MBM7070 indicator strain through 
electroporation (GenePulsor X Cell; Bio-Rad) and plated onto LB plates containing 
1 mM IPTG, 100 tg ml” ' 5-bromo-4-chloro-3-indolyl-B-p-galactopyranoside and 
100 pg ml! ampicillin. White and blue colonies were scored using Image] software, 
and the mutation frequency was calculated as the ratio of white (mutant) to total 
(white plus blue) colonies. 

POLQ gene expression. RNA samples extracted using the TRIzol reagent (Invi- 
trogen) were reverse transcribed using the Transcriptor Reverse Transcriptaze kit 
(Roche) and oligo dT primers. The resulting cDNA was use to analysed POLQ expres- 
sion by RT-qPCR using with QuantiTect SYBRGreen (Qiagen), in an iCycler machine 
(Bio-Rad). POLQ gene expression values were normalized to expression of the 
housekeeping gene GAPDH, using the ACt method and are shown on a log, scale. 
The primers used for POLQare as follows: POLQ primer 1 (forward: 5’-TATCTG 
CTGGAACTTTTGCTGA-3'’; reverse: 5’-CTCACACCATTTCTTTGATGGA-3’); 
POLQ primer 2 (forward: 5'-CTACAAGTGAAGGGAGATGAGG-3’; reverse: 
5'-TCAGAGGGTTTCACCAATCC-3’). 

Pol purification from insect SF9 cells. A Pol fragment (APol2) containing the 
ATPase domain with a RAD51 binding site (amino acids 1 to 1,000) was cloned 
into pFastBac-C-Flag and purified from baculovirus-infected SF9 insect cells as 
previously described”. Briefly, SF9 cells were seeded in 15-cm dishes at 80-90% 
confluency and infected with baculovirus. Three days post-infection, cells were col- 
lected and lysed in 500 mM NaCl lysis buffer (500 mM NaCl, 0.01% NP-40, 0.2 mM 
EDTA, 20% glycerol, 1 mM DTT, 0.2mM PMSF, 20 mM Tris (pH 7.6)) supple- 
mented with Halt protease inhibitor cocktail (Thermo Scientific) and calpain I 
inhibitor (Roche) and the protein was eluted in lysis buffer supplemented with 


0.2 mg ml ' of Flag peptide (Sigma). The protein was concentrated in lysis buffer 
using 10 kDa centrifugal filters (Amicon). The protein was quantified by com- 
paring its staining intensity (Coomassie-R250) with that of BSA standards ina 7% 
Tris-glycine SDS-PAGE gel. Purified protein was flash-frozen in small aliquots in 
liquid nitrogen and stored at —80 °C. 

Radiometric ATPase assay. Each 10 til reaction consisted of 200 nM ATP, reaction 
buffer (20 mM Tris-HCl (pH 7.6), 5 mM MgCh, 0.05 mg ml ~ 'BSA, 1mM DTT), 
and 5 u1Ciof [y-**P] ATP. For corresponding reactions, ssDNA, dsDNA, and forked 
DNA were added to the reaction in excess at a final concentration of 600 nM. Once 
all of the non-enzymatic reagents were combined, recombinant Pol® was added to 
start the ATPase reaction. After incubation for 90 min at room temperature, stop 
buffer (125 mM EDTA (pH 8.0)) was added and approximately ~0.05 |1Ci was spotted 
onto PEI-coated thin-layer chromatography (TLC) plates (Sigma). Unhydrolyzed 
[y-*?P]ATP was separated from the released inorganic phosphate [**P,] with 1 M 
acetic acid, 0.25 M lithium chloride as the mobile phase. TLC plates were exposed 
to a phosphor screen and imaged with the BioRad Imager PMC. ssDNA, dsDNA, 
and forked DNA were generated as previously described**. To remove any con- 
taminating ssDNA, dsDNA and forked DNA were gel purified after annealing. Spots 
corresponding to [y-*’P]ATP and the released inorganic phosphate [**P;] were 
quantified (in units of pixel intensity) and the fraction of ATP hydrolysed calcu- 
lated for each Pol concentration. 

Electrophoretic mobility gel shift assay (EMSA). Binding of Pol® to ssDNA was 
assessed using EMSA. 60-mer single-stranded DNA (ssDNA) or double-stranded 
DNA (dsDNA) oligonucleotides (5 nM) were incubated with increasing amount of 
Pol (0, 5, 10, 50, or 100 nM) in 10 pil of binding buffer (20 mM HEPES-K*, (pH 7.6), 
5mM magnesium acetate, 0.1 ug ult BSA, 5% glycerol, 1 mM DTT, 0.2 mM EDTA, 
and 0.01% NP-40) for 1 h on ice. Pol® protein was added at a tenfold dilution so that 
the final salt concentration was approximately 50 mM NaCl. The ssDNA probes 
were 5’ fluorescently labelled with IRDye-700 (IDT). After incubation, the samples 
were analysed on a 5% native polyacrylamide/0.5X TBE gel at 4 °C. A fluorescent 
imager (Li-Cor) was used to visualize the samples in the gel. 

RAD51 purification. Human GST-RAD51 was purified from bacteria as described”*. 
Xenopus Rad51 (xRad51) was purified as follows. N-terminally His-tagged SUMO- 
Rad51 was expressed in BL21 pLysS cells. Three hours after induction with 1 mM 
IPTG, cells were collected and resuspended in buffer A (50 mM Tris-Cl (pH 7.5), 
350 mM NaCl, 25% sucrose, 5 mM -mercaptoethanol, 1 mM PMSF and 10 mM 
imidazole). Cells were lysed by supplementation with Triton X-100 (0.2% final 
concentration), three freeze-thaw cycles and sonication (20 pulses at 40% efficiency). 
The soluble fraction was separated by centrifugation and incubated with 2 ml of 
Ni-NTA resin (Qiagen) for 1 h at 4 °C. After washing the resin with 100 ml of wash 
buffer (buffer A supplemented with 1 M NaCl, final concentration), the salt con- 
centration was brought down to 350 mM. His-SUMO-Rad51 was eluted with a 
linear gradient of imidazole from 10-300 mM in buffer A. Eluted fractions were 
analysed by SDS-PAGE. His-SUMO-Rad51 containing fractions were pooled and 
supplemented with Ulp1 protease to cleave the His-SUMO tag and dialysed over- 
night into buffer B (50 mM Tris-Cl (pH 7.5), 350 mM NaCl, 25% sucrose, 10% glyc- 
erol, 5mM £-mercaptoethanol, 10 mM imidazole and 0.05% Triton X-100). The 
dialysed fraction was incubated with Ni-NTA resin for 1h at 4°C and the Rad51 
containing flow-through fraction was collected and dialysed overnight into buffer 
C (100 mM potassium phosphate (pH 6.8), 150 mM NaCl, 10% glycerol, 0.5 mM 
DTT and 0.01% Triton-X). Rad51 was further purified by hydroxyapatite (Bio-Rad) 
chromatography. After washing with ten column volumes of buffer C, Rad51 was 
eluted with a linear gradient of potassium phosphate (pH 6.8) from 100-800 mM. 
Rad51 containing fractions were analysed by SDS-PAGE and dialysed into stor- 
age buffer (20 mM HEPES-KOH (pH 7.4), 150 mM NaCl, 10% glycerol, 0.5 mM 
DTT). Purified protein was flash-frozen in small aliquots in liquid nitrogen and 
stored at —80 °C. 

D-loop assay. D-loop formation assays were performed using xRad51 and con- 
ducted as previously described”. Briefly, nucleofilaments were first formed by incu- 
bating RAD51 (1 1M) with end-labelled 90-mer ssDNA (3 LM nt) at 37 °C for 10 min 
in reaction buffer containing 20 mM HEPES-KOH (pH 7.4), 1mM ATP, 1mM 
MgCl, 1 mM DTT, BSA (100 jig ml~ 1),20mM phosphocreatine and creatine phos- 
phokinase (20 1g ml~'). After the 10 min incubation, increasing amounts of Pol 
(0, 0.1, 0.5, or 1.0 1M) and RPA (200nM) were added and incubated for an addi- 
tional 15 min at 37 °C. The reaction was then supplemented with 1 mM CaCl, fol- 
lowed by further incubation at 37 °C for 15 min. D-loop formation was initiated by 
the addition of supercoiled dsDNA (pBS-KS (—), 79 uM bp) and incubation at 37 °C 
for 15 min. D-loops were analysed by electrophoresis on a 0.9% agarose gel after 
deproteinization. Gel was dried and exposed to a PhosphoImager (GE Healthcare) 
screen for quantification. 

Substitution peptide arrays and RAD51-ssDNA filament experiments. Substitution 
peptide arrays were performed as previously described'’. RAD51 displacement assays 
were performed as follows. Binding reactions (10 1l) contained 5’ -**p_end-labelled 
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DNA substrates (0.5 ng of 60 mer ssDNA) and various amounts of human RAD51 
and/or Pol@ in binding buffer (40 mM Tris-HCl (pH 7.5), 50 mM NaCl, 10 mM 
KCl, 2mM DTT, 5 mM ATP, 5 mM MgCl2, 1 mM DTT, 100 mg ml ~ | BSA) were 
conducted at room temperature. After 5 min incubation with Pol@ and a further 
5 min incubation with RAD51 or vice versa, an equimolar amount of cold DNA 
substrate was added to the reaction. Products were then analysed by electropho- 
resis through 10% PAGE (200 V for 40 min in 0.5X Tris-borate-EDTA buffer) and 
visualized by autoradiography. 

Interbreeding of the Fancd2 and Polq mice. For the characterization of Fancd2/ 
Polq conditional knockouts, we crossed C57BL/6J mice (Jackson Laboratory). 
Fancd2*’~ Polq*’* mice, previously generated in our laboratory”, were crossed with 
Fancd2*’* Polq*’~ mice’ to generate Fancd2*’ Polq‘’~ mice. These double hetero- 
zygous mice were then interbred, and the offspring from these mating pairs were 
genotyped using PCR primers for Fancd2 and Polg. A statistical comparison of the 
observed with the predicted genotypes was performed using a two-sided Fisher’s 
exact test. Primary MEFs were generated from E13.5 to E15 embryos and cultured 
in RPMI supplemented with 15% fetal bovine serum and 1% penicillin-streptomycin. 
All data generated in the study were extracted from experiments performed on 
primary MEFs from passage 1 to passage 4. The primers used for mice genotyping 
are as follows: Fancd2 PCR primers OST2cF (5'-CATGCATATAGGAACCCGA 
AGG-3'), OST2aR (5’-CAGGACCTTTGGAGAAGCAG-3’) and LTR2bF (5'-G 
GCGTTACTTAAGCTAGCTTG-3’); Polg PCR primers IMR5973 (5’-TGCAGTG 
TACAGATGTTACTTTT-3’), IMR 5974 (5'-TGGAGGTAGCATTTCTTCTC-3’), 
IMR 5975 (5'-TCACTAGGTTGGGGTTCTC-3’) and IMR 5976 (5’-CATCAGA 
AGCTGACTCTAGAG-3’). Specific PCR conditions are available upon request. 
Studies of xenograft-bearing CrTac:NCr-Foxn1nu mice. The Animal Resource 
Facility at The Dana-Farber Cancer Institute approved all housing situations, treat- 
ments and experiments using mice. No more than five mice were housed per air- 
filtered cage with ad libitum access to standard diet and water, and were maintained 
in a temperature- and light-controlled animal facility under pathogen-free condi- 
tions. All mice described in this text were drug and procedure naive before the start 
of the experiments. For every xenograft study, we subcutaneously implanted approx- 
imately 1.0 x 10° A2780 cells (1:1 in Matrigel Matrix, BD Biosciences) into both 
flanks of 6-8-week-old female CrTac:NCr-Foxn1nu mice (Taconic). Doxycycline 
(Sigma) was added to the food (625 p.p.m.) and bi-weekly (Tuesday and Friday) 
to the water (200 tg ml’) for mice bearing tumours that reached 100-200 mm’. 
Roughly one week (5-6 days) after the addition of doxycycline to the diet, mice 
were randomized to twice daily treatment schedules with vehicle (0.9% NaCl) or 
PARPi (ABT-888; 50 mg per kg body weight) by oral gavage administration for 
the indicated number of weeks. Overall survival was determined using Kaplan- 
Meier analyses performed with log-rank tests to assess differences in median sur- 
vival for each shRNA condition (shScr or shPol0) and each treatment condition 
(vehicle or PARPi) (GraphPad Prism 6 Software). For competition assays, A2780 
cells expressing FANCD2-GFP shRNA (GFP positive cells) or a combination of 
FANCD2-GFP shRNA with (doxycycline inducible) Scr-RFP or Pol€-RFP shRNA 
(GFP-REP positive cells) were mixed at an equal ratio of GFP to GFP-RFP posi- 
tive cells, and thereafter injected into nude mice given doxycycline-containing 
diets and treated with either vehicle or PARPi or CDDP. For competition assays, 
mice received identical doxycycline and PARPi drug treatment. For the cisplatin 
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competition assay, mice were randomized into semi-weekly treatment regimens 
with vehicle (0.9% NaCl) or CDDP (5 mg per kg body weight) by intraperitoneal injec- 
tion. After three to four weeks of treatment, mice were euthanized and tumours 
were grown in vitro, in the presence of doxycycline (2 1g ml for 4 days). The rela- 
tive ratio of GFP to GFP-RFP positive cells was determined by FACS analysis. Tumour 
volumes were calculated bi-weekly using caliper measurements (length X width*)/2. 
Growth curves were plotted as the mean tumour volume (mm‘*) for each treatment 
group; relative tumour volume (RTV) indicates the change in tumour volume at a 
given time point relative to the tumour volume at the day of initial measurement 
(volume of approximately 0.15 cm*) which was arbitrarily set to 1. Mice were unbi- 
asedly assigned into different treatment groups. Drug treatment and outcome assess- 
ment was performed in a blinded manner. Mice were monitored every day and 
euthanized by CO, inhalation when tumour size (= 2 cm), tumour status (necrosis/ 
ulceration) or body weight loss (= 20%) reached ethical endpoint, according to the 
rules of the Animal Resource Facility at The Dana-Farber Cancer Institute. 
Immunohistochemical staining. We stained formalin-fixed paraffin-embedded 
sections of harvested xenografts with antibodies specific for YH2AX (pSer139) (Upstate 
Biotechnology) and Ki67 (Dako). At least two xenografts were scored for each treat- 
ment. Tumours were collected 3 weeks after treatment. At least five 40 X fields were 
scored. The mean + s.e.m. percentage of positive cells from five images in each 
treatment group was calculated. 

Statistical analysis. Unless stated otherwise, all data are represented as mean + s.e.m. 
over at least three independent experiments, and significance was calculated using 
the Student’s t-test. Asterisks indicate statistically significant (*P < 0.05; **P< 10°’; 
**P < 10°) values. All the in vivo experiments were run with at least 6 tumours 
from 6 mice for each condition. No statistical methods were used to predetermine 
sample size. 
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Extended Data Figure 1 | POLQ is highly expressed in epithelial ovarian 
cancers (EOCs) and POLQ expression correlates with expression of HR 
genes. a, b, Gene set enrichment analysis (GSEA) for expression of 
TransLesion Synthesis (TLS) (a) and polymerase (b) genes between primary 
cancers and control samples in 28 independent data sets from 19 different 
cancers types. Enrichment values (represented as a single dot for each gene in a 
defined data set) were determined using the rank metric score to compare 
expression values between cancers and control samples. Dots above the dashed 
line reflect enrichment in cancer samples, whereas dots below the dashed 
line show gene expression enriched in control samples. Data sets were ranked 
based on the amplitude of the rank metric score and plotted as shown. c, POLQ 
gene expression in 40 independent data sets from 19 different cancer types. 
For each data set, POLQ values were expressed as fold-change differences 
relative to the mean expression in control samples, which was arbitrarily set 
to 1. d, POLQ expression correlates with tumour grade and MKi67 gene 
expression in the ovarian TCGA (n = 494 patients with ovarian carcinoma 
(grade 1, n = 5; grade 2, n = 61; grade 3, n = 428) and control samples, n = 8). 
e, POLQ expression correlates with tumour grade MKi67 gene expression in 
the ovarian data set GSE9891 (n = 251 patients with ovarian serous and 
endometrious carcinoma for which grade status was available (grade 1, n = 20; 
grade 2, n = 88; grade 3, n = 143)). Statistical correlation was assessed using 
the Pearson test (for d: r = 0.65, P< 10 °; fore: r=0.77, P< 10°). 

f, Top-ranked biological pathways differentially expressed between samples 
expressing high levels of POLQ (high POLQ, first 33%, n = 95) relative to 
samples with low POLQ expression (low POLQ, 67%, n = 190) on the ovarian 
data set GSE9891 (n = 285 patients with ovarian carcinoma). Significance 
values were determined by the hypergeometrical test using the 200 most 
differentially expressed probe sets between the 2 groups (high POLQ and low 
POLQ). g, GSEA for expression of DNA repair genes between primary cancers 
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and control samples in 5 independent ovarian cancer data sets. A representative 
heat map showing differential gene expression between ovarian cancers and 
controls is shown from GSE14407. For each data set, DNA repair genes 

were ranked based on the metric score reflecting their enrichment in cancer 
samples. The top 20 DNA repair genes primarily expressed in cancer samples 
compared to control samples is shown on the right. h, GSEA for the top 20 
DNA repair genes defined in g between primary cancers and control samples in 
40 independent cancer data sets. The nominal P value was used as a measure of 
the expression enrichment in cancer samples and represented as a waterfall 
plot. When the gene set expression was enriched in control samples, the P value 
was arbitrarily set to 1. i, POLQ expression correlates with RAD51 and FANCD2 
gene expression in 285 samples from the ovarian data set GSE9891. 
Statistical correlation was assessed using the Pearson test (r = 0.71, P< 10°). 
j, Top 10 genes that most closely correlated with POLQ expression (gene 
neighbours analysis) for 1,046 cell lines from the CCLE collection. DNA repair 
activity for these genes is indicated in the table. Increased HR gene expression is 
known to positively correlate with improved response to platinum based 
chemotherapy (a surrogate of HR deficiency) and thus can be predictive of 
decreased HR activity*'**. Conceptually, a state of HR deficiency may lead to 
compensatory increased expression of other HR genes. k, Top-ranked Gene 
Ontology (GO) terms for the molecular functions encoded by the top 20 DNA 
repair genes defined in Extended Data Fig. 1g. 1, Schematic representation of 
Pol@ domain structure with the helicases (BLM, RECQL4, RAD54B and 
RAD54L) that co-expressed with Pol@ (from Extended Data Fig. 1g). 
Conserved amino-acid sequences of ATP binding and hydrolysis motifs 
(namely Walker A and B) are indicated. Cox plots in c show twenty-fifth to 
seventy-fifth percentiles, with lines indicating the median, and whiskers 
indicating the smallest and largest values. For d and e (top panels), each dot 
represents the expression value from one patient, brackets show mean = s.e.m. 
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Extended Data Figure 2 | Pol is a RAD51-interacting protein required for 
maintenance of genomic stability. a, siRNA sequences (siPol01 and siPol@2) 
efficiently downregulate exogenously transfected Pol0 protein. Pol levels 
were detected by immunoblotting with Flag or Pol antibody (left) and by 
RT-qPCR using 2 different sets of POLQ primers (right). The asterisk on the 
immunoblot indicates a non-specific band. Expression was normalized using 
GAPDH as a reference gene. POLQ gene expression values are displayed as 
fold-change differences relative to the mean expression in control cells, which 
was arbitrarily set to 1. b, Quantification of baseline and HU-induced RAD51 
foci in U2OS cells transfected with the indicated siRNA. c, Quantification 

of baseline and HU-induced yH2AX foci in U2OS cells transfected with the 


indicated siRNA. d, Quantification of IR-induced RAD51 foci in BrdU-positive 
U20S cells transfected with the indicated siRNA. e, Pol@ inhibition by siRNA 
induced a decrease in the cellular survival of 293T cells treated with MMC 

in a 3-day survival assay. f, Quantification of chromosomal aberrations in 293T 
cells transfected with the indicated siRNA. g, Schematic representation of Pol 
truncation proteins used for RAD51 interaction studies. h, Endogenous 
RADS1 co-precipitates with Flag-tagged Pol0-APoll (Pol0-1-1,416) but not 
Pol0-1633-Cter, each stably expressed in HeLa cells. i, Sequence alignment 
between the RADS51-interacting motifs of C. elegans RFS-1 and human Pol. 
j, Schematic of Pol@ domain structure with its homologues HELQ and POLN. 
All data show mean + s.e.m. 
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Extended Data Figure 3 | Characterization of RADS51-interacting motifs in 
Pol@. a, Substitution peptide array probed with recombinant RAD51 and 
analysed by immunoblotting. A 20-mer peptide spanning each of the RAD51 
binding sites (shown in Fig. 1g) were created in which each amino acid of 
the original peptide was mutated to each of the 20 amino acids and RAD51 
binding activity was tested. The amino acid change for each of the amino acids 
of the RADS51 interacting domain of Pol@ is shown on the right. Ponceau 
staining was used to visualize position of the peptides within the array. 

b, GST-RAD51 pull-down with in vitro translated Pol0 proteins missing 
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studies. d, Quantification of IR-induced RAD51 foci in U2OS cells stably 
integrated with empty vector (EV) or Pol0-APoll cDNA that is refractory to 
siPol01. Cells were transfected with indicated siRNA and subsequently treated 
with IR. The number of cells with more than 10 RADS1 foci was calculated 
relative to control cells (si Scr). e, DR-GFP assay in U2OS cells stably integrated 
with empty vector (EV) or indicated Pol@ cDNA constructs refractory to 
siPol@1 and transfected with indicated siRNA. All data show mean ~ s.e.m. 
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Extended Data Figure 


nucleofilament assembly and formation of RAD51-dependent D-loop 
structures. a, Representative APol2 wild-type radiometric ATPase assay. b, Gel 
mobility shift assays with APol2 wild type and ssDNA. c, Coomassie-stained 
gel showing the purified APol2-A-dead fragment. d, Representative APol2-A- 
dead radiometric ATPase assay. e, Quantification of APol2-A-dead ATPase 
activity. (ssDNA, single-stranded DNA; dsDNA, double-stranded DNA). 


4 | Pol@ isan ATPase that suppresses RAD51-ssDNA _ increasing amounts of APol2 wild type. The order in which each component 


f, Assembly/disruption of RAD51-ssDNA filaments in the presence of 
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was added to the reaction is noted above. g, Schematics of the formation of 
RAD51-dependent D-loop structures. h Formation of RAD51-containing 
D-loop structures following the addition of increasing amounts of APol2 
wild type. i, Fraction of D-loop formed following the addition of increasing 
amounts of APol2 wild type. j, Effect of siPol and the different Pol0 cDNA 
constructs on HR read-out. NA, not applicable. Data in i shows mean = s.e.m. 
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Extended Data Figure 5 | Pol@ functions under replicative stress and is 
induced by HR deficiency. a, Pol® recruitment to the chromatin is enhanced 
by UV treatment. HeLa cells stably integrated with either Flag-tagged APoll or 
Pol-1633-Cter (Extended Data Fig. 2g) were subjected to UV treatment. 
Cells were collected at indicated time points after UV treatment and IPs were 
performed on nuclear and chromatin fractions. b, HeLa cells stably integrated 
with APoll were treated with UV and collected at the indicated time points 
following UV exposure. Pol0 and RADS51 co-precipitation is enhanced by UV 
treatment. c, Quantification of DNA fibre lengths isolated from wild-type or 
Polq ‘~ MEFs. d, Quantification of DNA fibre lengths isolated from wild-type 
or Polq ’ ~ MEFs transfected with either EV or Pol@ cDNA constructs. e, POLQ 
gene expression was analysed by RT-qPCR in HR-deficient ovarian cancer 
cell lines (PEO-1 and UWB1-289) compared with other ovarian cancer cell 
lines, HeLa (cervical cancer) cells and 293T (transformed human embryonic 
kidney) cells. Expression was normalized using GAPDH gene as a reference. 
POLQ expression values are displayed as fold-change relative to the mean 
expression in HR-proficient control cells, which was arbitrarily set to 1. f, POLQ 
gene expression analysis (RT-qPCR) in 293T cells transfected with siRNA 


targeting FANCD2, BRCAI or BRCA2 (left panel) and in corrected PD20 cells 
(PD20 + FANCD2) relative to FANCD2-deficient cells (PD20) (right panel). 
Expression was normalized using GAPDH gene as a reference. POLQ 
expression values are presented as fold-change relative to the mean expression 
in control cells, which was arbitrarily set to 1. g, POLQ gene expression in 5 data 
sets of serous epithelial ovarian carcinoma (frequently associated with an 

HR deficiency) and 1 data set of clear cell ovarian carcinoma (subgroup not 
associated with HR alterations). For each data set, POLQ expression values are 
displayed as fold-change differences relative to the mean expression in control 
samples, which was arbitrarily set to 1. h, Progression-free survival (PFS) 
after first line platinum chemotherapy for patients with ovarian carcinoma 
(ovarian carcinoma TCGA). Statistical significance was assessed by the 
log-rank test (P< 10 7). i, Effect of Pol expression levels and HR status on 
tumour senstivity to cisplatin or PARPi. NA, not applicable. Box plots in 

c, d, and g show twenty-fifth to seventy-fifth percentiles, with lines indicating 
the median, and whiskers indicating the smallest and largest values. Data in 

e and f show mean = s.e.m. 
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Extended Data Figure 6 | Pol@ inhibition sensitizes HR-deficient tumours 
to cytotoxic drug exposure. a—c, Clonogenic formation of A2780 cells 
expressing scrambled (Scr) shRNA or shRNAs against FANCD2 or BRCA2 
with increasing amounts of MMC (a), UV (b) or IR (c). d-f, Clonogenic 
formation of A2780 cells expressing scrambled (Scr) or FANCD2 shRNA, 
together with shRNA targeting Pol0, in increasing concentrations of 

CDDP (d), MMC (e) or PARPi (f). g, Inhibition of Pol reduces the survival 
of A2780 cells after 3 days of continuous exposure to the ATM inhibitor 
Ku55933. h, Immunoblot analyses in A2780 cells expressing FANCD2 shRNA 


shRNAs: TP53 Pold WRN 


together with siRNA targeting Pol® or Scr at 24h after indicated MMC pulse 
treatment. i, FANCA-deficient fibroblasts (GM6418) were infected with a 
whole-genome shRNA library and treated with MMC for 7 days. The fold- 
change enrichment of each shRNA after MMC treatment was determined by 
sequencing relative to the infected cells before treatment. TP53 depletion is 
known to improve survival of FANCA!~ cells*?. WRN depletion has recently 
been shown to be synthetically lethal with HR deficiency*’. Each column 
represents the mean of at least 2 independent shRNAs. All data show 

mean + s.e.m. 
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Extended Data Figure 7 | HR and Pol repair pathways are synthetical 
lethal in vivo. a, Genotypes frequencies of offspring from interbred 
Fancd2*’-Polq*’ mice. V, four Fancd2-’~ Polq~’~ offsprings were observed 
with several congenital malformations and premature death within 48 h of 
birth. b, Description of Fancd2Polq’~ offspring generated in the study. The 
offspring presented congenital malformations (that is, eye defects) together 
with reduced size and body weight. The arrow indicates absence of the right eye. 
c, Genotypes frequencies of E13.5 to E15 embryos (13.5 to 15 days post coitum) 
from interbred Fancd2*’-Polq*’~ mice. d, Description of congenital 
malformations and their measured frequencies observed in E13.5 to E15 
Fancd2‘ Polq’~ embryos generated in the study. e, Clonogenic formation of 
wild-type, Fancd2~’~, Polq-‘~ and Fancd2’Polq’- MEFs with increasing 
concentrations of PARPi. f, A2780 cells were transduced with indicated 
shRNAs and xenotransplanted into both flanks of athymic nude mice. The 
tumour volumes for individual mice were measured biweekly for 8 weeks. 


LETTER 


Each group represents n = 5 tumours from n= 5 mice. g, Ki67 and 

yH2AX quantification in tumours treated with either vehicle or PARPi. 

h, Representative Ki67 and yH2AX staining of A2780-shFANCD2 
xenografts expressing sh Scr or sh Pol@ in athymic nude mice, treated with 
either vehicle or PARPi. Scale bars, 100 UM. i, In vivo competition assay 
design. j, Tumour chimaerism post-xenotransplantation for indicated 
conditions. k, Representative flow cytometry analysis of tumours before 
xenotransplantation (post-FACS sorting) or after xenotransplantation (post- 
transplant, PARPi). The percentage of GFP-RFP positive cells is indicated. 
1, Tumour chimaerism post-xenotransplantation for indicated conditions. 
For data in j and 1, each circle represents data from one tumour and each group 
represents n = 7 tumours from n = 6 mice. Brackets show mean + s.e.m. 
Data in e~g show mean + s.e.m. For f each group represents n = 6 tumours 
from n= 6 mice. 
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Extended Data Figure 8 | Pol@ is required for HR-deficient cell survival IR-induced RADS1 foci in U2OS cells transfected with indicated siRNA. 


and limits the formation of RAD51 structures in HR-deficient cells. e, RADS51 recruitment to chromatin is enhanced by UV treatment. VU 423 cells 
a, Clonogenic formation of Fancd2~’" Polq’’~ MEFs transfected with full- (BRCA2~'~) were collected at indicated time points after UV treatment and 
length POLQ cDNA constructs in the presence of increasing concentrations of | immunoblotting performed on the cytoplasmic, nuclear and chromatin 
PARPi. b, Chromosome breakage analysis of FANCD2-depleted cells that fractions. f, RAD51 recruitment to chromatin in VU 423 cells (BRCA2‘~) 


were first transfected with the indicated siRNA and full-length POLQ cDNA __ transfected with indicated siRNA. Histone H3 was used as a control for 
constructs refractory to siPol01 and then exposed to MMC. c, DR-GFP assay in _ chromatin fractionation. All data show mean + s.e.m. 
U20S cells transfected with indicated siRNA. d, Quantification of baseline and 
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Extended Data Figure 10 | Model depicting the role of Pol@ in DNA repair. 
a, Mechanistic model for how Pol limits RAD51-ssDNA filament assembly. 
According to this model, the ATPase domain of Pol@ may prevent the 
assembly of RAD51 monomers into RADS51 polymers, perhaps by depleting 
local ATP concentrations. The RAD51 binding domains in the central region of 
Pol@ may then sequester the RAD51 monomers, preventing filament assembly. 


(insertions/deletions) 


b, (I) Under physiological conditions, Pol@ expression is low and its impact 
on repair of DNA double-strand breaks (DSB) is limited. (II) When HR 
deficiency occurs, Pol@ is then highly expressed and channels DSB repair 
towards alt-EJ. (III) In the case of an HR-defect, the loss of Pol leads to cell 
death through the persistence of toxic RAD51 intermediates and inhibition 
of alt-EJ. 
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US FUNDING How state contributions to 
research and development stack up p.265 


IMMIGRATION Proposed rules would help 
scientists to stay in United States p.265 


NATUREJOBS For the latest career 
listings and advice www.naturejobs.com 


SOCIAL MEDIA 


A network boost 


How scientists can use Twitter to expand their social 


contacts and find jobs. 


BY MONYA BAKER 


initially sceptical that Twitter was anything 
more than a self-promotional time-sink. 
But when she noticed that her graduate stu- 
dents were receiving conference and co- 
authoring invitations through connections 


[inst scientist Cassidy Sugimoto was 


made on Twitter, she decided to give the social- 
media platform a try. An exchange that began 
last year as short posts, or ‘tweets; relating to 
conference sessions led to a new contact offer- 
ing to help her negotiate access to an internal 
data set from a large scientific society. “Because 
we started the conversation on Twitter, it 
allowed me to move the conversation into the 


physical world,” says Sugimoto, who studies 
how ideas are disseminated among scientists 
at Indiana University in Bloomington. “It’s 
allowed me to open up new communities for 
discussions and increase the interdisciplinarity 
of my research” 

Relatively few scientists are taking the 
opportunities Twitter offers. In a 2014 online 
Nature survey on social media habits, just 12% 
of the more than 3,000 scientists and engineers 
who responded reported that they used Twit- 
ter regularly (see Nature 512, 126-129; 2014). 
By contrast, the Pew Research Center, a non- 
partisan think tank based in Washington DC, 
found that almost one-quarter of all US adults 
with Internet access are on Twitter. Research- 
ers in computation-intensive disciplines such 
as astrophysics tend to use the service more, 
but no estimates suggest that any discipline of 
scholars is using Twitter at a higher rate than 
the general public, says Sugimoto. 

That leaves much networking potential 
untapped, say Twitter enthusiasts. The oppor- 
tunities for microblogging — posting brief, 
regular updates online — are plentiful and 
far-reaching, and can help young scientists to 
build their careers. Following thought leaders 
and relevant organizations is an effective, easy 
way for researchers to learn about important 
papers, events, funding sources, potential col- 
leagues and job opportunities. Scientists who 
tweet report that they receive invitations to 
speak at conferences and events, and make last- 
ing professional connections. There are down- 
sides; scholars need to manage their online time 
and reputations effectively. Still, by strategically 
selecting whom to follow and what to contrib- 
ute on Twitter, young researchers can build a 
powerful virtual network that will yield oppor- 
tunities, information and advice. 


CREATE CONNECTIONS 

People who use Twitter may do so as active par- 
ticipants, posting anything that can fit into 140 
characters, and also as followers who read these 
tweets (see ‘On Twitter but not tweeting’). More 
than one million users follow CERN, the par- 
ticle-physics laboratory near Geneva, Switzer- 
land, for example. Participants often follow 100 
or more users, and so constantly receive posts in 
their Twitter feed from researchers outside their 
own immediate networks. Users can also search 
for posts on a particular topic using the hash 
symbol followed by a keyword, and curate their 
connections for discussions relevant to their 
interests and careers. “By following the people 
you find interesting and may want to work > 
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CHRISTOPHE HEYLEN/GETTY 


> with, you're among the first to know when 
they have an open position within their labs or 
institutions,” says Jacob Jolij, a neuroscientist at 
the University of Groningen in the Netherlands. 
For instance, if a potential supervisor or peer is 
moving lab, he or she may announce it on Twit- 
ter. Such posts can suggest that labs and institu- 
tions may soon be looking for new employees. 

When Thea Whitman was a doctoral student 
in soil science at Cornell University in Ithaca, 
New York, in 2013, she forwarded, or retweeted, 
a post from another soil researcher announc- 
ing a tenure-track opening at the University 
of Wisconsin—Madison. She did not think of 
herself as a candidate: she was considering 
options for a postdoctoral position. But when 
she looked again at the posting, she realized 
that her interests and qualifications matched 
the job requirements. She sent in an applica- 
tion, landed an interview and got the job. (She 
starts in January 2016 after finishing a postdoc 
stint at the University of California, Berkeley.) 

One chemist who writes a blog and tweets 
under the username Chemjobber about work 
in the US chemical and drug industry esti- 
mates that he tweets three to five positions a 
day. He receives two or three notes a year from 
readers who tell him that they found the advert 
for their new job through one of his posts. 

Although plenty of job announcements 
made on Twitter fail to attract suitable can- 
didates, tweeting and retweeting can help to 
expand recruitment efforts. In December, 
Matthew MacManes, a genomic biologist at 
the University of New Hampshire, Durham, 
tweeted a link for a tenure-track position in his 
department. Within 2 weeks, retweets brought 
the posting to more than 10,000 Twitter users, 
and some 200 viewed a description of the 
position, he says. “These are candidates that I 
wouldn't have otherwise reached” 


FIND AN EMPLOYER — OR EMPLOYEE 
Twitter is not the primary way that young 
scientists find jobs, however. Gwynn Benner, 
who coordinates career services for postdocs 
and graduate students at the University of Cali- 
fornia, Davis, says that she often sees a mismatch 
in Twitter usage at career fairs. “The employers 
will be tweeting, Tve got a booth; and the stu- 
dents are just not on Twitter” A steady stream 
of tweets come from @naturejobs, @Science- 
Careers, university career offices, aggregators 
and employers, but Benner thinks that Twitter 
could be overwhelming as a primary tool ina 
job search. Instead, she says, it should be used 
strategically, to learn what potential employers 
are up to, and whether they have job openings. 
“Say there are five companies I want to target,” 
she says. “That's when I get on Twitter” 
Twitter can help in early-career research- 
ers job searches by allowing them to see other 
users’ previous posts and current connections. 
Arne Bakker at the Stanford University Career 
Development Center in California provides 
advice for people with science PhDs and for 


Cassidy Sugimoto says that Twitter can help to 
shape scientific networks. 


postdocs, and says that following institutions, 
companies and individuals on Twitter can offer 
clues about workplace culture and ongoing 
projects in a way that static websites do not. 
That knowledge can be especially helpful dur- 
ing a job interview, he says. It can also paint 
a picture of what the job might be like, says 
Whitman. “If your future adviser or colleagues 
are active on Twitter, it can give you insight 
into their personality. Do they tend to be nega- 
tive? Constructive? How do they respond to 
criticism?” 

Employers use Twitter to evaluate potential 
recruits as well. Evolutionary biologist Iain 
Couzin is setting up a department to study 
collective animal behaviour after moving to 
the Max Planck Institute for Ornithology in 
Konstanz, Germany. He says that Twitter is 
becoming a tool to help find excellent young 
scientists. “I get to know who many of the can- 
didates are as I have also been following them,” 
he says. Danielle Bassett, a bioengineer at the 
University of Pennsylvania, Philadelphia, says 
that although she has not used Twitter to recruit 
lab members directly, she does look at online 
activity; a history of tweets that demonstrate 
scientific insight and interdisciplinary interests 
has increased candidates’ chances, she says. 

But a Twitter account is not an automatic 
boost. “Some of my older colleagues think that 
if you are using social media, you don't have 
enough to do,’ says Jessica McCarty, who stud- 
ies land use at Michigan Tech Research Insti- 
tute in Ann Arbor. “It isa double-edged sword,’ 
warns Jennifer Biddle, an assistant professor 
at the University of Delaware in Newark who 
studies environmental microorganisms. “If 
you are outspoken or mostly post about your 
personal life, you may create prejudgement.” 

Or worse: in one particularly controversial 
case, the University of Illinois at Urbana- 
Champaign rescinded a job offer for a 
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tenure-track position after a candidate posted 
inflammatory tweets. Some scientists, includ- 
ing Chemjobber, have opted to not use their 
real names to avoid potential conflicts with 
employers. 


THE NETWORK WAY 

Career consultants and university guidance 
counsellors interviewed by Nature Careers 
de-emphasized Twitter in favour of encourag- 
ing an online presence on LinkedIn (see Nature 
516, 441-442; 2014). Nonetheless, Twitter and 
other forms of social media are changing the 
playing field. Danielle N. Lee, an outreach 
advocate and postdoc in psychology at Cor- 
nell University, says that her blogposts and 
tweets about making science more inclusive 
for women and minorities have yielded pres- 
tigious speaking engagements and invitations 
to write articles for publication. 

Sugimoto thinks that social media may be 
starting to reweave the fabric of traditional aca- 
demic research. “I’m seeing students creating 
identities that don't have to be routed through 
the principal investigator,’ she says. “I see doc- 
toral students making increasing use of Twitter 
to brand themselves.” Although existing stud- 
ies on the topic are small and research methods 
are still being worked out, there is some sug- 
gestion that social media can have an equaliz- 
ing effect by making people without access to 
conventional networks more visible, she says. 

Twitter's value to job seekers is more about 
making connections than finding a newly 
advertised job, says Chemjobber. “The reason 
to get on Twitter for your job search is that it 
offers you a way to short-circuit traditional 
networking,” he says. “It doesn't matter if you're 
a full professor or a grad student or an early- 
career person, you can get noticed.” 

And the social-media platform helps 
users to cross disciplines, says Hiroki Ueda, 
a systems biologist at the RIKEN Center 
for Developmental Biology in Kobe, Japan. 
“Sometimes I get interested in PhD students 

and postdocs espe- 


“Tt doesn’t cially from different 
matter if you’re fields — chemistry, 
afull professor physics, informa- 
oragrad tion science — just 
on ei catia though their tweets.” 

I Twitter can enable 
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Pp J > that would be impos- 
getnoticed. sible in the physical 


world, says Caleph 
Wilson, an immunologist at the University 
of Pennsylvania. He participates in a Twitter 
group that has launched weekly digital con- 
versations using the hashtag #BlackandSTEM. 
The platform provides a forum where he can 
share his experiences as an African American 
working in science, technology, engineering 
and mathematics (STEM), and younger sci- 
entists can learn from them. “Physically, you 
may be in a situation where you are the only 


S. CRAIG FINLAY 


#SCIENCETRENDING 


On Twitter but not tweeting 


Acommon strategy on Twitter is lurking: 
reading tweets but not posting them. 
For many users, Twitter becomes 

their main way to learn about relevant 
papers, conferences and news. To build 
that information feed, users need to 
choose which streams to follow and 
what hashtags to monitor, such as 
#lifeafterPhD. Many follow relevant 
departments in grant agencies; 
@NlHfunding has 24,000 followers, for 
example. 

Many journals and journal editors 
tweet their tables of contents and retweet 
relevant comments. Beginners on 
Twitter can also find accounts to follow 
through retweeted posts and by looking 
through followers of other users. Lists of 
recommended people to follow abound 
as well. Programmes such as TweetDeck 
or Hootsuite can sort Twitter streams by 
username and hashtag. 


person [in an under-represented group] 
but through social media, you are in a space 
where you can have the all-important STEM 
vent session,’ he says. 

Science exchanges on Twitter are generally 
convivial, but there is no doubt that Twitter 
can get ugly. In November, the leader of the 
Rosetta Mission that landed a probe on a 
comet wore a shirt printed with scantily clad 
women. A science writer who tweeted that 
the attire made astronomy less welcoming to 
women received multiple tweets telling her 
to kill herself. 


NO PERSONAL POSTS 

Although horrible tweets and abusive ‘trolls’ 
exist, they are not a significant part of most 
scientists’ experience on Twitter, says Chris 
Gunter, a researcher and science communi- 
cator at Marcus Autism Center and Emory 
University in Atlanta, Georgia. Those who 
fear Twitter may not realize how much they 
can control their experience. “You can unfol- 
low or mute people,’ she says, “and you can 
take a break for a while.’ As a precaution, 
she avoids inflammatory or overly personal 
posts, such as using family members’ names. 
For conversations that require nuance, users 
should switch to other types of communica- 
tion, she says. It is common for interactions 
that begin on Twitter to move over to e-mail, 
for example. 

Although conventions on social media are 
still emerging, the basic rules of networking 
still apply, says Lisa Balbes, a career-devel- 
opment counsellor in Kirkwood, Missouri. 
“Tt’s a weird, messy landscape right now,’ she 


And Twitter is boosting the scope of 
conferences, too, helping people who 
cannot attend to follow what is going on. At 
the Annual Geophysical Union meeting in 
San Francisco, California, last December, 
attendees numbered about 24,000, yet 
more than 28,000 people posted almost 
57,000 tweets and retweets with the 
hashtag #agu14 — double the previous 
year. Specific sessions within conferences 
often have their own hashtags, catering to 
researchers’ specific interests. 

High levels of Twitter activity can be 
intimidating, so the best approach is 
to read tweets selectively. Lisa Balbes, 

a career-development consultant in 
Kirkwood, Missouri, advises Twitter users 
not to even try to check every post. She 
thinks of Twitter as an additional source of 
information and networking. “I skim the 
headlines when | have a couple minutes,” 
she says. M.B. 


says. “It comes down to building a relation- 
ship with other people through whatever 
tools they are using.” Relationships require 
more than a single click. Twitter users should 
not assume, for example, that being mutual 
followers with another user means that the 
person has taken an interest in helping them. 
An online reputation for being thought- 
ful, enterprising and helpful can be as valu- 
able as a long list of publications, says career 
consultant Peter Fiske, head of PAX Water 
Technologies in Richmond, California. Sci- 
entific conference organizers and observers 
often follow a meeting's tweetstream to learn 
what generated excitement, and to find rising 
stars. Informed tweets can help to draw their 
attention, says Gunter. For the past several 
years, she has chaired committees that select 
speakers and moderators for the American 
Society of Human Genetics in Bethesda, 
Maryland. “The tweets alone can’t suggest a 
good speaker,’ she says, “but tweeting coher- 
ently about the topic is always a good sign’. 
Twitter's greatest advantage may be its flex- 
ibility in terms of the time spent and level of 
commitment. “You can dip your toes in — you 
don't have to be a crazy twittermaniac,’ says 
Titus Brown, a bioinformatician at the Uni- 
versity of California, Davis. “In the past few 
years, I’ve seen it grow considerably in profes- 
sional usefulness. It will continue to evolve,’ 
he predicts. “Find a way to use it in a way that 
makes sense with your personality and time 
constraints — and it will be useful for you.” m 


Monya Baker writes and edits for Nature’s 
Careers section. 
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GOVERNMENT FUNDING 


State contributions 


Individual states funnelled US$1.8 billion 
into research labs and studies in 2013, 

with one-quarter of that devoted to basic 
research, finds a survey by the National 
Science Foundation in Arlington, Virginia. 
Although federal funding for research 

and development (R&D) dwarfs state 
investments, state expenditures can help 

to tailor workforces to regional needs, 

says James Hearn, associate director of 

the Institute of Higher Education at the 
University of Georgia in Athens. Five states 
together accounted for almost three- 

fifths of the investments (see “Iop R&D 
spenders’). External R&D — mainly that at 
academic institutions — tended to receive 
more than the internal R&D conducted by 
state agencies. 


TOP R&D SPENDERS 


California is top, but even Ohio, Texas and New York 
spent as much as the bottom 25 states combined. 


* M Internal R&D 
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IMMIGRATION 


Scientists gain access 


Proposed federal legislation would exempt 
scientists from some US immigration 
quotas. Similar legislation introduced 

by the Senate in 2013 failed to make it 
through the House of Representatives. 
However, Atessa Chehrazi, an immigration 
attorney in San Francisco, California, says 
that foreign researchers would gain many 
more opportunities to work in the United 
States if even targeted provisions of the bill 
pass, such as a proposal to allow graduate 
students who arrive on non-immigrant 
visas to seek permanent resident status. 
Restrictive employment quotas and visas 
for scientists and other highly trained 
workers have come under attack in the 

past decade. More than a dozen higher- 
education associations are urging Congress 
to pass the bill. 
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Ua SCIENCE FICTION 


THE LAST ONE 


BY IULIA GEQRGESCU 


glass. The deserted pedestrian tunnel 

that runs beneath the streets of Kensing- 
ton stretches ahead of me. A second later, 
Iam running fast into the walk- 
way’s murky depths, thanking 
science for my light exoskeleton 
and my employer for paying for 
it. How did people manage to 
do their jobs without enhance- 
ment prosthetics? The question 
flits through my mind as I rush 
past an array of adverts that 
would have been invisible in the 
darkness, but for my smart con- 
tact lenses. I catch a glimpse of 
the poster promoting Extinct! 
Objects That Are No More, the 
new exhibition at the Science 
Museum. The very place where 
I have just acquired the item 
ordered by my employer. 

I check my sensors without 
slowing my pace. No pursuit. Not 
yet. I tighten my grip on the box 
in my arms. It had been a risky 
job. The security at the museum 
was tight. Very tight. And for 
what? A gift for his lovely grand- 
daughter, he says. Such a valu- 
able resource wasted as a toy for 
a five-year-old! Like having a water fight in 
the desert. I despise my filthy rich employer 
and his eccentric job requests, but then, he 
did pay for all of my enhancements. And he 
got me out of jail. Twice. 

I reach the end of the tunnel and stop 
abruptly. Sensor check. There’s some move- 
ment at the far end, where I came through 
the roof. No time to waste. With one hand, 
I lift a heavy manhole cover and slide into 
the gap, pulling the lid closed behind me. 
Hastily, I climb down the ladder, hurrying to 
the disused tunnels that link to the London 
Underground system. As I hit the ground, I 
head for one of the larger passageways and 
start running again. 

Even with all the muscular implants, I 
am getting tired. No time to stop. It’s been a 
hell ofa night. I've been lucky to make it this 
far. Who would think that a museum would 

have so many layers 


| Fea amid a cloud of shattered 


> NATURE.COM of security? Cam- 
Follow Futures: eras, sensors, drones, 
Y @NatureFutures guards in the finest 
Ei go.naturecom/mtoodm exoskeletons. The 


What a waste. 


irony, of course, is that the objects on display, 
now worth risking your skin for, were once 
ubiquitous. The most common things in the 
world. Even I remember seeing much of that 
stuff when I was a kid. But after decades of 
overconsumption and a planet stripped of 


resources, stuff that once was worthless is 
now worth a fortune. 


I recheck my sensors. Five pursuers. 
Damn. Perhaps others that are shielded? 
Should I risk a deeper scan? Better to try 
to outrun them. IfI make it to the meeting 
point, I might even survive the night. Lucky 
the box is light. 

Run. 

Back at the exhibition I think I saw a plas- 
tic bottle like my mum used to have. She said 
that all drinks came in such containers and 
that cities overflowed with plastic garbage. 
Hard to imagine that anyone would throw 
away such valuable material. But that was in 
the age of waste, before people figured out 
how badly they needed such resources for 
the 3D printing industry. 

My sensors tell me that the pursuers are 
closing in. They've brought three drones. I 
can't outrun those. Desperately, I scan the 
map on my retinal display. I switch direc- 
tion and enter another tunnel. Three min- 
utes till the District line train reaches this 
section. Perhaps I can make it. My muscles 
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are screaming, my vision is getting blurred, 
but I keep moving. 

Part of my mind keeps going back to the 
exhibition. We made so many wrong choices, 
wasted so many opportunities, exhausted our 
resources, lost so much. Just like me and my 
life. Somehow I never had the 
strength to do the right things. 

I hear the Tube train long 
before the lights break into the 
tunnel. A few tens of seconds till 
it reaches me. I stop and face the 
oncoming lights. Focus. Count- 
ing. Three. Two. One. Just before 
the train hits me, I use my last 
drops of energy to jump high. 

I land on the roof ofa carriage. 
The exoskeleton on my legs 
cracks badly. I cling to the train 
with one hand and grip the box 
with the other. I see the num- 
bers on my retinal display. One 
minute forty-five seconds to the 
next stop. 

I jump to the platform before 
the train stops. The badly dam- 
aged exoskeleton somehow 
manages to keep my legs from 
breaking. Praise the maker. I 
start running, slower now. I don't 
even need the sensors to feel my 
pursuers closing in. I struggle up 
the stairs and see the faint light of 
morning outside. I’ve made it! 

But there’s no rescue waiting for me as I 
emerge into the dawn’s rays. Drones close in 
from behind. Ahead are four armed cops and 
a little bald man with old-fashioned glasses, 
whom my retinal display identifies as the 
museum curator. I spot a different style of 
drone descending slowly towards me. My 
bastard employer is abandoning me, but 
has sent his toy to recover his prize. Fine. 
Perhaps it’s time to quit this job anyway. I 
gather my little remaining strength to throw 
the box to the approaching drone. 

“Noooo! Don't shoot you idiots!” 

I hear the little man screaming and the 
guns firing. The box shatters and I fall on 
the cold, wet pavement. Blood tastes warm 
in my mouth as I hold out my arm. The red 
helium balloon rises gracefully into the 
morning sky. = 


Iulia Georgescu is an editor of Nature 
Physics. At work she reads science and on 
the daily commute on the Tube she imagines 
the science turning into science fiction. 
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